In the sixth part of the compiler project you will extend your existing compiler frontend (which builds the checked intermediate representation for SIMPLE programs in the form of the symbol table and the abstract syntax tree) with a backend that actually generates code for SIMPLE programs. The code generator first performs storage allocation for all variables in the ST and then generates instructions during a post-order-style traversal of the AST (roughly one node at a time).
The generated code must enforce the context conditions that could not be enforced during (static) semantic analysis (array bounds, division by zero), just like the interpreter had to. The code generator also has to enforce various machine restrictions (see below). You can get the abstract grammar for the SIMPLE programming language here. The context conditions for this assignment are the same as for the interpreter given here.
You are generating ARMv6 assembly suitable for the Raspberry Pi Linux
machines you have an account for.
Your ARM code may call C library functions for
strtol, as well as division and modulo (since ARMv6 doesn’t have
machine instructions for those).
No other C library functions may be called!
You are largely on your own for this, but you can discuss ARM assembly
issues (not code generator design!) on Piazza all you want.
The compiler consists of a number of modules and classes working together
to translate programs written in SIMPLE into equivalent programs written
in assembly language.
The driver is called
sc and is invoked from the shell as follows:
Invocation = "./sc" ["-" ("s"|"c"|"t"|"a"|"i")] [filename] .
This describes the syntax of the command line in EBNF.
./sc itself, the user can supply one option (introduced by “
to tell the driver which parts of the compiler to run and what kind of
output to produce.
With this assignment “no options at all” is finally allowed on the
command line for
sc; only “unknown options” or a file that can not be
opened should result in errors from the driver. Of course the options
-i are unchanged from previous
If no option is given, the driver is supposed to
generate code for an
input program. If the name of a program is given,
example, the driver should generate an assembly source file called
simple.s for it; if the program is read from the standard input, the
driver should write the assembly source to the standard output instead.
If an error is detected before the AST is completely built, the code generator should not run. If an error is detected during code generation, no assembly source should be output; instead the compiler should stop with an appropriate error message.
The code generator needs to traverse both the symbol table (ST) and the abstract syntax tree (AST), and you should apply the visitor design pattern for these tasks once more. Except for the brief notes below, you’re pretty much on your own for this assignment…
Storage allocation for SIMPLE is pretty straightforward: For every integer variable you allocate “standard integer size” bytes of data memory; for most of the architectures you’ll need four bytes per integer. For record variables you allocate enough memory to hold their fields, which eventually comes down to a number of integer variables as well. For array variables you proceed the same way, but you should remember to store them in the “most convenient” order. Given that everything is an integer in the end, most likely you will not have to perform data alignment at all (but you should make sure by reading the relevant documentation for your architecture).
The easiest way to keep track of both the size of a variable (or
rather the size of its type) and its address is in the ST:
Just extend the
Variable class with fields (and methods?) to take care
of a variable’s address and extend the
Type class with fields (and
methods?) to take care of the type’s size.
Note that you cannot perform storage allocation in the frontend: it’s
a task that depends on the target architecture we generate code for!
As part of storage allocation you should also enforce machine restrictions. Two obvious restrictions are (a) that declared constants can indeed be represented on the architecture and (b) that the amount of memory necessary to hold all variables doesn’t exceed what your generated code can actually address. (You will have to repeat the first check again when you traverse the AST as there might be literal constants that cannot be represented.)
Note that “machine restrictions” are not excuses. You can’t say “Well,
I couldn’t figure out how to add a 32-bit constant to a register, so the
compiler accepts only some additions.” and expect points. Whatever
restrictions you make have to be sensible and well-defended in your
README file. If in doubt, ask on Piazza if a certain restriction you’re
planning is okay.
Targeting a real architecture like ARM might seem harder than targeting some virtual stack machine, but that’s not necessarily true. For example, you can use the (comparatively) large number of registers available the ARM to your advantage, especially when generating code for array and record assignments (where you need to copy blocks of memory).
Before you start working on the code generator, you should study the ARM architecture carefully and decide how all the various SIMPLE constructs are to be mapped onto sequences of assembly instructions. Of course much of this was discussed in lecture, for example the use of the stack to communicate information between the code patterns for individual nodes of the AST, the dereferencing of locations, the handling of record fields, etc. However, you should try to become familiar with the architecture in general, not just with the subset you are using for this assignment: Future assignments require that you know more details in order to improve your code generator or add new language features such as procedures!
Once you are sure about the code patterns (including the all-important register conventions you are going to use!), you should implement the traversal of the AST to actually generate the relevant instructions. You will have to take the specifications of the assembly instructions into account, especially which register is used for what purpose. For the interpreter you could influence this process, but it is “fixed” now that we generate code for real hardware.
Remember that you have to generate code for checking the value of an index against the size of the array! If an array index is out of bounds, you should abort the program with an error message — ideally a message that indicates where in the source this problem originates! Same for other runtime errors…
The advice from earlier assignments about using exceptions for error handling is still in effect, as is the required format for your error messages:
error: some helpful description
If you followed the advice for error handling on previous assignments, you should have little trouble handling the new errors. Except of course runtime errors, those need to be processed in assembly now! But you should still follow the format from before.
If you are taking this course at the graduate level, the run-time errors that are possible with this assignment should produce accurate position information just like your compile-time errors do; the same is true for violated machine restrictions of course.
Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!
Regardless of your programming language of choice, we expect to build
your project using
make (if it needs building at all) and we expect to
run your project using
./sc (which stands for “SIMPLE compiler”).
You are free to use the standard library for your language of choice,
except for modules/classes that allow you to avoid writing large
parts of the code for an assignment; so no regular expressions, no parsing
Depending on your language of choice, compliance with certain tools
valgrind), compiler flags, or additional style
guides may also be required; see Piazza for details.
For reference, here is a short explanation of the grading criteria; not all of the criteria apply to all problems on a given assignment, and not all of the assignments even use all of the criteria.
Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.
Style refers to programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for.
Design refers to proper modularization (into functions, classes, modules, etc.) and the proper choice of algorithms and data structures.
Performance refers to how fast/with how little memory your project can produce the required results compared to other submissions; in this course this can mean your actual compiler or interpreter as well as the code generated by it.
Functionality refers to your programs being able to do what they
should according to the specification given above.
(It also refers to you simply doing the required work, which may not be
If the specification is ambiguous, ask for clarification!
If no clarification is forthcoming, defend the choices you have made
If your project cannot be built, or if it is otherwise obvious that you
never tested it, you will get no points whatsoever.
If you project cannot be built without warnings using the required
compiler options we will take off 10%.
If your programs cannot be built using
make we will take off 10%.
valgrind detects memory errors in your programs, we will take off 10%.
If your project fails miserably even once, i.e. terminates with an
exception of any kind or dumps core, we will take off 10%.
Presumably you see the pattern here?