Assignment 6: Simple (aka Silly) Code Generator

Overview

In the sixth part of the compiler project you will extend your existing compiler frontend (which builds the checked intermediate representation for SIMPLE programs in the form of the symbol table and the abstract syntax tree) with a backend that actually generates code for SIMPLE programs. The code generator first performs storage allocation for all variables in the ST and then generates instructions during a post-order-style traversal of the AST (roughly one node at a time).

The generated code must enforce the context conditions that could not be enforced during (static) semantic analysis (array bounds, division by zero), just like the interpreter had to. The code generator also has to enforce various machine restrictions (see below). You can get the abstract grammar for the SIMPLE programming language here. The context conditions for this assignment are the same as for the interpreter given here.

You are generating ARMv6 assembly suitable for the Raspberry Pi Linux machines you have an account for. Your ARM code may call C library functions for exit, printf, fprintf, scanf, strtol, as well as division and modulo (since ARMv6 doesn’t have machine instructions for those). No other C library functions may be called! You are largely on your own for this, but you can discuss ARM assembly issues (not code generator design!) on Piazza all you want.

Problem 1: Driver Program (10%)

The compiler consists of a number of modules and classes working together to translate programs written in SIMPLE into equivalent programs written in assembly language. The driver is called sc and is invoked from the shell as follows:

Invocation = "./sc" ["-" ("s"|"c"|"t"|"a"|"i")] [filename] .

This describes the syntax of the command line in EBNF. After ./sc itself, the user can supply one option (introduced by “-”) to tell the driver which parts of the compiler to run and what kind of output to produce.

Arguments and Options

With this assignment “no options at all” is finally allowed on the command line for sc; only “unknown options” or a file that can not be opened should result in errors from the driver. Of course the options -s, -c, -t, -a, and -i are unchanged from previous assignments.

If no option is given, the driver is supposed to generate code for an input program. If the name of a program is given, simple.sim for example, the driver should generate an assembly source file called simple.s for it; if the program is read from the standard input, the driver should write the assembly source to the standard output instead.

If an error is detected before the AST is completely built, the code generator should not run. If an error is detected during code generation, no assembly source should be output; instead the compiler should stop with an appropriate error message.

Problem 2: Code Generator (90%)

The code generator needs to traverse both the symbol table (ST) and the abstract syntax tree (AST), and you should apply the visitor design pattern for these tasks once more. Except for the brief notes below, you’re pretty much on your own for this assignment…

Storage Allocation

Storage allocation for SIMPLE is pretty straightforward: For every integer variable you allocate “standard integer size” bytes of data memory; for most of the architectures you’ll need four bytes per integer. For record variables you allocate enough memory to hold their fields, which eventually comes down to a number of integer variables as well. For array variables you proceed the same way, but you should remember to store them in the “most convenient” order. Given that everything is an integer in the end, most likely you will not have to perform data alignment at all (but you should make sure by reading the relevant documentation for your architecture).

The easiest way to keep track of both the size of a variable (or rather the size of its type) and its address is in the ST: Just extend the Variable class with fields (and methods?) to take care of a variable’s address and extend the Type class with fields (and methods?) to take care of the type’s size. Note that you cannot perform storage allocation in the frontend: it’s a task that depends on the target architecture we generate code for!

As part of storage allocation you should also enforce machine restrictions. Two obvious restrictions are (a) that declared constants can indeed be represented on the architecture and (b) that the amount of memory necessary to hold all variables doesn’t exceed what your generated code can actually address. (You will have to repeat the first check again when you traverse the AST as there might be literal constants that cannot be represented.)

Note that “machine restrictions” are not excuses. You can’t say “Well, I couldn’t figure out how to add a 32-bit constant to a register, so the compiler accepts only some additions.” and expect points. Whatever restrictions you make have to be sensible and well-defended in your README file. If in doubt, ask on Piazza if a certain restriction you’re planning is okay.

Code Generation

Targeting a real architecture like ARM might seem harder than targeting some virtual stack machine, but that’s not necessarily true. For example, you can use the (comparatively) large number of registers available the ARM to your advantage, especially when generating code for array and record assignments (where you need to copy blocks of memory).

Before you start working on the code generator, you should study the ARM architecture carefully and decide how all the various SIMPLE constructs are to be mapped onto sequences of assembly instructions. Of course much of this was discussed in lecture, for example the use of the stack to communicate information between the code patterns for individual nodes of the AST, the dereferencing of locations, the handling of record fields, etc. However, you should try to become familiar with the architecture in general, not just with the subset you are using for this assignment: Future assignments require that you know more details in order to improve your code generator or add new language features such as procedures!

Once you are sure about the code patterns (including the all-important register conventions you are going to use!), you should implement the traversal of the AST to actually generate the relevant instructions. You will have to take the specifications of the assembly instructions into account, especially which register is used for what purpose. For the interpreter you could influence this process, but it is “fixed” now that we generate code for real hardware.

Remember that you have to generate code for checking the value of an index against the size of the array! If an array index is out of bounds, you should abort the program with an error message — ideally a message that indicates where in the source this problem originates! Same for other runtime errors…

Error Handling

The advice from earlier assignments about using exceptions for error handling is still in effect, as is the required format for your error messages:

error: some helpful description

If you followed the advice for error handling on previous assignments, you should have little trouble handling the new errors. Except of course runtime errors, those need to be processed in assembly now! But you should still follow the format from before.

Graduate Level Requirements

If you are taking this course at the graduate level, the run-time errors that are possible with this assignment should produce accurate position information just like your compile-time errors do; the same is true for violated machine restrictions of course.

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Regardless of your programming language of choice, we expect to build your project using make (if it needs building at all) and we expect to run your project using ./sc (which stands for “SIMPLE compiler”). You are free to use the standard library for your language of choice, except for modules/classes that allow you to avoid writing large parts of the code for an assignment; so no regular expressions, no parsing combinators, etc. Depending on your language of choice, compliance with certain tools (e.g. checkstyle or valgrind), compiler flags, or additional style guides may also be required; see Piazza for details.

Grading

For reference, here is a short explanation of the grading criteria; not all of the criteria apply to all problems on a given assignment, and not all of the assignments even use all of the criteria.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for.

Design refers to proper modularization (into functions, classes, modules, etc.) and the proper choice of algorithms and data structures.

Performance refers to how fast/with how little memory your project can produce the required results compared to other submissions; in this course this can mean your actual compiler or interpreter as well as the code generated by it.

Functionality refers to your programs being able to do what they should according to the specification given above. (It also refers to you simply doing the required work, which may not be programming alone.) If the specification is ambiguous, ask for clarification! If no clarification is forthcoming, defend the choices you have made in your README file.

If your project cannot be built, or if it is otherwise obvious that you never tested it, you will get no points whatsoever. If you project cannot be built without warnings using the required compiler options we will take off 10%. If your programs cannot be built using make we will take off 10%. If valgrind detects memory errors in your programs, we will take off 10%. If your project fails miserably even once, i.e. terminates with an exception of any kind or dumps core, we will take off 10%. Presumably you see the pattern here?