Computer Language Design and Engineering
Aims: To investigate the basic techniques which underpin computer language design and translation. To investigate in more detail one aspect of this field, and to implement a tool that performs a related task.
Background: Programming language processors (which include compilers, interpreters and API’s for data interchange formats such JSON and XML) need to `understand’ some sort of syntax, and then execute appropriate actions according to a defined semantics. For instance, the javac compiler processes programs written in Java syntax and outputs compiled instructions for the Java Virtual Machine. A python interpreter understands Python syntax, and directly executes Python phrases as soon as it recognises them.
This set of projects focuses on the specification and implementation of syntax and semantics for such processors. Interestingly, a variety of algorithms have been developed which allow the tools to be automatically generated from appropriate specifications. These techniques are very important because nearly all of our software is written in high level languages, and nearly all high level languages are implemented using these ideas.
Technologies used by programming language processors include: the use of context free grammars to specify syntax; the use of lexical analysis and parsing algorithms to convert an user’s program into an internal tree form; the use of term rewriting to modify those trees; and the use of attribute grammars and structural operational semantics to specify interpretations of that syntax including code generation and optimisation.
To do well on these projects you need to be interested in programming languages, have good programming skills and be comfortable with the kind of mathematics that you encountered in CS1870. You will write your code in Java, and make use of tools developed within the department such as RDP and ART. In the first part of this project you will learn about fundamental topics in this area including syntax specification, lexical analysis, parsing and simple interpretation. You will then move on to a more detailed study of one area. The first term deliverables are described in the early deliverables section. In the second part of the project (mostly the second term) you will focus on the design and implementation of tooling for programming language processing. Your detailed topic must be agreed with your superviser; suggestions include the following.
- A tool that takes a set of regular expressions and uses Thompson’sconstruction and the subset construction to produce a lexical analyserfor the set of expressions.
- A compiler for a simple Java subset language capable of executing small programs.
- An interpreter for a small language using either attribute grammars or structural operational semantics.
- A parser generator using either the LR or recursive descent algorithms.
- An implementation of a context free term rewriting system.
- Lexical analysers are typically based on finite state automata of the kind discussed in CS1870. The algorithms are well known but implementation details are important in the development of an efficient tool, and you will be expected to show how such optimisations can be used to improve performance.
- A minimal language might comprise an integer-only subset of Java which supported basic control flow. Once this has been demonstrated, then it could then be extended to support extended control flow constructs such as switch, break and procedure call, as well as extensions to the type system, especially arrays.
- Language interpretation can be viewed as adding actions to BNF specifications such that when a parser recognises a sub-phrase of the language, it can immediately execute those actions. Attribute grammars work by embedding actions within the grammar and providing attributes that allow information to flow up and down the trees constructed by parsers. Structural Operational Semantics take a different approach in which actions are triggered during the `reduction’ of a program to some minimal form.
- A variety of parsing algorithms have been developed. Two of the most commonly used are LR parsing and Recursive Descent parsing. A parser generator takes a specification for a language L written in BNF (for which you will already have a parser from your term 1 work) and then adds actions that cause it to emit a program that when compiled, will act as a parser for L. Your task is to understand your chosen parsing algorithm, and then write a parser generator for that algorithm. Interestingly, a correctly working parser generator can be `bootstrapped’, that is it can be used to process its own specification.
- A term rewriting system rewrites expressions. For instance, we know that in Boolean algebra, the logical AND of FALSE and any value is false. We can express this as FALSE AND x rewrites to FALSE. We also know that the logical AND of true and some value is just that value: TRUE AND x rewrites to x. A term rewriter takes a collection of these rewrites representing (in this case) Boolean identities, and attempts to simplify expressions by performing these substitutions. In this project, you will build a system which will explore rewrites over terms and perform useful computations.
- A report on the use of context free grammars to specify syntax, along with manual procedures for generating and parsing languages.
- A report on the use of a parsing tool to produce derivations, and the use of grammar idioms to capture notions of associativity and priority in arithmetic expressions.
- The development of an interpreter for a language that models a four function calculator with memory.
- The development of a pretty printer for BNF specifications, demonstrating that programs can process their own source code.
- A technical report covering all of the above work, including a bibliography.
- A Term 2 project schedule with a list of deliverables and dates.
- A project report containing a description of all the research you have carried out and the details of your second term implementation.
- An implementation of your chosen tool, See the suggestions in the Backg