Assignment 7: Decent Code Generator

Overview

In the seventh part of the compiler project you will extend your existing compiler backend (which so far generates very naive code) to take advantage of more machine registers available on the ARM platform and more context information available in the abstract syntax tree (AST).

You have several options for doing this, the details are up to you to a much larger degree than in previous assignments; with that freedom comes, of course, full responsibility for your decisions. So weigh your options carefully! The generated code must of course still be correct meaning that it enforces all required context conditions and machine restrictions as before.

You are generating ARMv6 assembly suitable for the Raspberry Pi Linux machines you have an account for. Your ARM code may call C library functions for exit, printf, fprintf, scanf, strtol, as well as division and modulo (since ARMv6 doesn’t have machine instructions for those). No other C library functions may be called! You are largely on your own for this, but you can discuss ARM assembly issues (not code generator design!) on Piazza all you want.

Problem 1: Driver Program (10%)

The compiler consists of a number of modules and classes working together to translate programs written in SIMPLE into equivalent programs written in assembly language. The driver is called sc and is invoked from the shell as follows:

Invocation = "./sc" ["-" ("s"|"c"|"t"|"a"|"i")] [filename] .

This describes the syntax of the command line in EBNF. After ./sc itself, the user can supply one option (introduced by “-”) to tell the driver which parts of the compiler to run and what kind of output to produce.

Arguments and Options

With this assignment the option -x is added to the command line for the driver; it indicates that the user wants to run the improved code generator instead of the basic code generator from the previous assignment; if “no option” is given, that basic code generator should still run.

The -x option behaves very much like “no option” did in that it generates code for a given input program. If the name of a program is given, simple.sim for example, the driver should generate an assembly source file called simple.s for it; if the program is read from the standard input, the driver should write the assembly source to the standard output instead.

If an error is detected before the AST is completely built, the code generator should not run. If an error is detected during code generation, no assembly source should be output; instead the compiler should stop with an appropriate error message.

Problem 2: Code Generator (90%)

You have to decide whether you want to focus on simply making better use of registers (option A) or if you also want to take contextual information into account (option B). If you choose option A you can get at most 80-90% (meaning you “lose” 10-20%) of the points for this problem, if you choose option B (which includes most of option A, see below) you can get 95-100% (meaning you “lose” 0-5%) of the points for this problem; consider your options carefully.

Register Allocation Only (Option A)

Your naive code generator from Assignment 6 uses the stack for all temporary values needed during the execution of your program. This leads to an excessive amount of memory-traffic, something that’s especially sad on a processor like ARM with several available general-purpose registers.

So for this problem, you have to change your code generator to use registers for temporary values instead of using the actual stack. We discussed a number of options for this in lecture, from using a set of registers as a “virtual stack” to keeping values in registers even across instructions. I strongly suggest that you implement the following approach, alas it’s up to you if you want to make it more or less fancy; how fancy in part determines how many points we’ll dock in the range documented for Option A above:

  1. Use registers for temporary values only, that is only within an instruction, not across instructions.
  2. Allocate and free registers in a stack-like fashion. It’s up to you how many registers you use for this, but obviously it’s better to use more. (Just keep in mind that you may need certain registers at certain points in the program, for example when you call a C library function.)
  3. If you run out of registers during code generation, write half of your “register stack” to the real stack and “reuse” the registers freed up that way. Of course you have to bring those values back in later when enough registers have been freed again to result in an “empty register stack” as it were. (In other words, your code generator should never “give up” with an “out of registers” error.)

While this approach to registers has obvious drawbacks, it still represents a significant improvement over the naive stack-based code generator from the previous assignment.

If you want to do better than what I recommended above but still stick with “Option A” as it were, the “next step up” is probably to get away from the stack-based approach to registers. Instead you’d track allocated and free registers (which now have to be requested explicitly during code generation) and view those registers as a cache of sorts. Then, instead of always spilling the bottom of the stack, you can spill, for example, the least-recently-used (LRU) register when you need another one for a new value. (Actually you could do even better if you tracked life-ranges, but you really don’t have to.) The details of where to spill to and how much are up to you. But I probably recommended the first solution for a reason… Note that the extension to “random” register allocation requires that you “pass back” the register a value is in to the parent node. That’s a tiny aspect of Option B below, but it’s so tiny it’s barely worth mentioning.

Context Information plus Register Allocation (Option B)

Your naive code generator from the Assignment 6 processes the AST one node at a time (for the most part). As a result, the instructions you select must be more general than necessary in a given context. For example, the expression a + 1 always results in a general add instruction instead of a more specialized add immediate instruction. (We also “waste” a register for the constant 1 that could have been included in the add immediate instruction.) If we take more context information into account, we can therefore not only produce shorter and faster code, we can also decrease the demand for registers, a welcome side-effect.

So for this problem, you have to change your code generator to take “larger pieces” of the AST into account before spitting out instructions. You’ll use the Wirth-inspired idea of “items” or “descriptors” that capture context information “flowing up” the AST; the exact number of different item kinds depends on the architecture you generate code for, but you will most likely need items for at least each of the following:

Remember the main point of items: We want to delay the emission of an assembly instruction as long as possible, until we know “everything” there is to know about the context in which the computation happens. Specifically, the following four goals should be met by your new code generator:

  1. For expressions involving (suitably small, suitably patterned) constants, the constant should not have to be loaded into a register but embedded into an assembly instruction as an immediate value.
  2. For (cascades of) field selectors, all relevant offsets should be computed at compile-time and no address arithmetic should be necessary at runtime; you should also generate assembly instructions that use indirect addressing from the base register with a fixed offset for loading from (or storing to) a variable or record field. (If the value fits into a register that is).
  3. For (cascades of) index selectors with constant expressions, all relevant offsets should be computed at compile-time as well; in addition, it should be a compile-time error to index with a constant expression outside array boundaries.
  4. You should replace multiplications/divisions/modulos by a constant power-of-two with the corresponding SHIFT or AND operations. The multiplication part is particulary handy for some index computations, especially when dealing with arrays of integers. The division/modulo part is particularly helpful on the ARMv6 because you can avoid a SLOOOOOW call to the C library that way.

There are plenty of other improvements you can consider. Using algebraic identities you can, for example, get rid of additions of 0 or multiplications by 1 (and many more). When assigning records or arrays (and hence copying sections of memory) you can, for example, take advantage of the ARM instructions to load/store multiple registers at a time. For constant conditions you can, for example, avoid generating code for (parts of) IF and REPEAT instructions. And so on, and so forth. There really is no limit. :-)

Note that you must essentially also implement all of Option A in order to implement Option B effectively!

Writing the README

You have more flexibility for this assignment, which means you’ll have to explain what you did in more detail in your README file. Please tell us exactly what improvements you have made over the naive code generator from the previous assignment. Make sure to list each improvement, not just those that go beyond what’s required anyway. Also, if you’ve tried some additional improvement but gave up on it, we’d be curious to hear about the details. Write a good README for this one! If the TAs have to spend lots of time just trying to guess what you implemented, they are within their rights to dock some points!

Error Handling

The advice from earlier assignments about using exceptions for error handling is still in effect, as is the required format for your error messages:

error: some helpful description

If you followed the advice for error handling on previous assignments, you should have little trouble handling the new errors. Except of course runtime errors, those need to be processed in assembly now! But you should still follow the format from before.

Graduate Level Requirements

If you are taking this course at the graduate level, the new errors that are possible with this assignment should produce accurate position information.

Furthermore we expect graduate students to do Option B, not Option A. If, as a graduate student, you only hand in Option A, you will most likely be penalized more severely.

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Regardless of your programming language of choice, we expect to build your project using make (if it needs building at all) and we expect to run your project using ./sc (which stands for “SIMPLE compiler”). You are free to use the standard library for your language of choice, except for modules/classes that allow you to avoid writing large parts of the code for an assignment; so no regular expressions, no parsing combinators, etc. Depending on your language of choice, compliance with certain tools (e.g. checkstyle or valgrind), compiler flags, or additional style guides may also be required; see Piazza for details.

Grading

For reference, here is a short explanation of the grading criteria; not all of the criteria apply to all problems on a given assignment, and not all of the assignments even use all of the criteria.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for.

Design refers to proper modularization (into functions, classes, modules, etc.) and the proper choice of algorithms and data structures.

Performance refers to how fast/with how little memory your project can produce the required results compared to other submissions; in this course this can mean your actual compiler or interpreter as well as the code generated by it.

Functionality refers to your programs being able to do what they should according to the specification given above. (It also refers to you simply doing the required work, which may not be programming alone.) If the specification is ambiguous, ask for clarification! If no clarification is forthcoming, defend the choices you have made in your README file.

If your project cannot be built, or if it is otherwise obvious that you never tested it, you will get no points whatsoever. If you project cannot be built without warnings using the required compiler options we will take off 10%. If your programs cannot be built using make we will take off 10%. If valgrind detects memory errors in your programs, we will take off 10%. If your project fails miserably even once, i.e. terminates with an exception of any kind or dumps core, we will take off 10%. Presumably you see the pattern here?