Assignment 8: Procedures for Fun and Profit

Overview

In the eighth (sp?) and final (!) part of the compiler project you will extend the SIMPLE language and your existing SIMPLE compiler with procedure declarations and procedure calls. You have several options for doing this, the details are up to you to a much larger extent than in previous assignments; with that freedom comes, of course, full responsibility for your decisions. So weigh your options carefully.

The generated code must of course still be correct meaning that it enforces all required context conditions and machine restrictions as before. This assignment also adds quite a few new context conditions, but the details depend on what exactly you choose to do (see below).

You are generating ARMv6 assembly suitable for the Raspberry Pi Linux machines you have an account for. Your ARM code may call C library functions for exit, printf, fprintf, scanf, strtol, as well as division and modulo (since ARMv6 doesn’t have machine instructions for those). No other C library functions may be called! You are largely on your own for this, but you can discuss ARM assembly issues (not code generator design!) on Piazza all you want.

Extending SIMPLE

Adding procedures to SIMPLE will affect all components of your existing compiler, from the scanner all the way to the code generator. So we’ll first describe the changes to SIMPLE without any reference to your implementation.

Syntax and Design Considerations

We only give the modified productions here, the rest of the syntax is unchanged from the existing grammar for SIMPLE. Please be sure to read all the explanatory notes below!

Declarations = { ConstDecl | TypeDecl | VarDecl | ProcDecl } .

ProcDecl = "PROCEDURE" identifier "(" [Formals] ")" [":" identifier] ";"
  Locals [ "BEGIN" Instructions ] [ "RETURN" Expression ]
  "END" identifier ";" .

Formals = Formal { ";" Formal } .
Formal = IdentifierList ":" identifier .
Locals = { VarDecl } .

Factor = integer | Designator | "(" Expression ")" | Call .
Instruction = Assign | If | Repeat | While | Read | Write | Call .

Call = identifier "(" [Actuals] ")" .
Actuals = ExpressionList .

We add a new kind of declaration called ProcDecl that allows us to introduce procedures in the program scope. (We could have renamed Declarations to Globals to be more consistent with the following, but we decided not to.) Procedures are complicated beasts, so let’s take it slow.

Inside a procedure we can declare local variables; these should not affect the rest of the program in any way! (We discussed what Declarations instead of the new Locals would mean in lecture and decided to just do the bare minimum, namely variables. In particular we rule out nested procedure declarations, a big simplification. If you’re curious as to what would be necessary to support nested procedures, feel free to google static link, dynamic link, and display register; but none of that is required here.) At runtime, storage for local variables (and other things, see below) is allocated on the stack every time the procedure is “activated” by a call. This interacts (in a sad way as far as performance is concerned) with SIMPLE’s rule about variables always being initialized to 0; make sure you take care of that!

Procedures also have a (possibly empty) list of formal parameters and (optionally) an return type for functions. Formal parameters constitute a kind of local variable, but these variables are initialized by the caller, not the callee. Formals and locals live in the same scope inside the procedure, so we cannot have a local variable with the same name as a formal parameter; however, the procedure scope can “shadow” global identifiers without any problem. Presence of a return type means that the procedure can only be called as an expression, absence of a return type means that the procedure can only be called as an instruction. (We discussed in lecture why we use identifier instead of Type for both formal parameters and return type; make sure you check the obvious context condition!)

All procedures can have a body of instructions to be executed when the procedure is “activated” by a call; functions can also have a return expression; the type of this expression has to be compatible with the return type. (Note that procedures as defined above can be completely empty: No locals, no body, no return expression, etc. Expect a test case like that.)

Aside from procedure declarations we also add the new syntactic form Call to denote the “activation” of a procedure. We add Call to both Instruction and Expression as indicated, but note that this introduces an LL(1) conflict between Call and Designator in both productions. You should resolve this conflict by looking the identifier up in the symbol table: if it’s a procedure, parse Call, otherwise parse Designator. (Yes, this is a hack, it’s not the “right” thing to do. But it’ll mostly work. Try to convince yourself that the conflict doesn’t lead to additional problems elsewhere in the extended language!) Any particular procedure can only be called as either an expression or an instruction, not both (why?).

The formal parameters of a procedure are matched to the actual parameters provided in a call from left to right in the order they were written down as is common practice in almost all programming languages. The type of each actual parameter has to be compatible with the corresponding formal parameter (context condition!); for a function call, the return type of the function has to “fit” into the expression it occurs in (another context condition!).

The calling convention of the architecture you’re generating code for determines a lot of the details that make procedures work. For example, recall that on the ARM we’re expected to pass the first four actual parameters in registers r0 to r3 with any remaining arguments passed on the stack; similarly we’re expected to return the result from a function in r0. As mentioned above, local variables are also allocated on the stack, and the particular layout of where all the various pieces end up should follow what other programs for the same machine do was well. One way to figure those details out is to study the code produced by gcc at -O0 carefully, a highly recommended exercise!

With the horrors of code generation in mind, we’re adding two more context conditions to the language that should make things a little easier. Both of these have to do with the distinction between basic types (only INTEGER in SIMPLE) and structured types (ARRAY and RECORD types in SIMPLE). First, the return type of a function must be a basic type; this way it will always fit into a register. Second, formal parameters of structured types are passed by reference; so instead of making a copy of the ARRAY or RECORD we simply pass its address to the procedure; this way even formal parameters of structured type “fit” into a register. Note, however, that this second context condition requires a new kind of “item” in your code generator: SIMPLE as a language doesn’t have pointer types, but with “pass by reference” in effect it essentially needs to “grow” a pointer type internally. Have fun with the details…

Intermediate Representation

Since procedures are declared we need to add them to the symbol table. You’ll need to introduce a new kind of Entry class for procedures; each entry contains a scope of formal parameters and local variables; each entry also contains up to two abstract syntax trees representing the body of a procedure (if any) as well as the return expression of a function (if any).

You may also want to split your existing Variable entry further into GlobalVariable, LocalVariable, and FormalVariable. Doing this will allow you to control how offsets are computed better, which is essential for the stack-based implementation of local variables and formal parameters (one will get negative offsets while the other will get positive offsets from the frame pointer).

Finally you need to add a new Node to the abstract syntax tree to represent procedure or function calls; note that it may be easier to introduce two classes, ProcedureCall and FunctionCall say, then you can avoid having to “multiple-inherit” from Expression as well as Instruction.

Recursive Procedures

Procedures become an awesome language feature when they are allowed to be recursive. However, the way we handled declarations in SIMPLE so far rules that out: We only make an entry into the current scope after we have completely processed a declaration. For recursive calls to work we need to be able to refer to the procedure we are defining inside of its own body, before we’re “done” with the declaration.

There are several ways to address this problem. The simplest is to just make the entry right away before you start processing the body of a procedure; then when you are done with the body (and the return expression, if any), you update the procedure’s entry with those ASTs. But this only allows simple recursive procedures, it does not allow mutually recursive procedures; so we would, for example, not be able to write a recursive-descent parser in SIMPLE. Sadface.

For the purpose of this assignment, we can live with that restriction. However, if you feel ambitious, consider adding either forward declarations for procedures or replacing the entire declaration mechanism with one that can resolve arbitrary dependencies; Google is your friend.

Context Conditions

Of course procedures come with a number of additional context conditions that you need to enforce. Here are a few examples:

There are probably more, some of which we mentioned implicitly above in the “Syntax and Design” section. You’re a not just a compiler hacker now, you’re also (hopefully) a better language designer. So make sure you find all the context conditions that matter.

Problem 1: Implement Procedures (100%)

The entire assignment is just this single problem, implementing the procedure mechanism outlined above. There are a number of tradeoffs that will affect how many points you can get however, so read the following carefully.

Driver Program

There are no changes to the driver program; you can hack procedures into your compiler in any way you choose. Depending on where you implement procedures, we will run our test cases with -x or no options whatsoever; be sure to be specific in your README file!

Language Options (50%)

You don’t have to implement all of the procedure mechanism, the distribution of (possible) points is as follows:

The “rest” category means that you have procedure declarations and calls, but nothing else: no locals, no parameters, no results, no recursive calls. This is the bare minimum you need to have to get any points for this assignment. Note that even if you don’t implement a certain feature, you still need to be able to parse the new syntax and properly diagnose that the feature is missing!

Implementation Options (50%)

You don’t have to implement procedures in your advanced code generator; you get to choose where to put your implementation as follows:

Note that the boundary between basic and advanced code generator is a little fluid (since you had options for the previous assignment) and if in doubt we’ll make a judgement call as to where your percentages should fall.

Extra Credit (10%)

If you implement a feature that is not required at all, for example adding procedures to the interpreter, resolving arbitrary declaration dependencies, or inlining simple functions, you can get up to 10% of extra credit; if in doubt, ask first (before you spend the time!) whether we will consider your particular feature “worthy” of extra credit. Your feature(s) cannot change the syntax for procedures in any way! (You can, however, add new syntax, for example for forward declarations.)

Random Bits of Advice

First, at your option, you can “scrap” the debugging output from previous assignments (printing/drawing the ST and AST) if you think that doing so will save you time. For what it’s worth, having the debugging output seems especially valuable for this assignment, so removing it doesn’t seem like a good idea; but you are certainly allowed to willfully ignore good ideas.

If you want to add procedures to the interpreter, you cannot create the entire environment at compile-time anymore: When a procedure is called, you have to build a new environment for the call at run-time. This is in fact the only major difficulty in integrating procedures with the interpreter.

For the code generator the fundamental decision to make is the layout of the stack frame you want to use. We discussed the basic motivation for stack frames in lecture so I won’t repeat them here; but please stick closely to the established conventions for stack frames that exist for your architecture! If your architecture encourages passing parameters in registers you should do so, but for simplicity you may want to always reserve space on the stack (in the correct section!) as well so you have a place where you can save these registers if you need them for another call.

Remember that we allow formal parameters of any type, but we pass them to procedures differently. Basic types (that’s INTEGER for us) are passed “by value” meaning they are copied into something akin to local variables; assignments to these formals inside the procedure affect the local copy only. Structured types (that’s ARRAY and RECORD types) are passed “by reference” meaning that only their address is copied; we treat references to these formals as implicit pointer dereferences, so assignments to those formals inside the procedure will affect the actual parameter itself. This leads to a new kind of addressing mode you have to support: The interpreter needs a new kind of “box” while the code generator needs a new kind of “item”.

Writing the README

You have a lot of flexibility for this assignment, which means you’ll have to explain what you did in more detail in your README file. Please tell us exactly which parts of the assignment you attempted, what you have achieved (meaning what actually works), and which additional features (if any) you worked on. Please explain in some depth how each of your features worked out or why you had to give up on something. Write a good README for this one! If the TAs have to spend lots of time just trying to guess what you implemented, they are within their rights to dock some points!

Error Handling

The advice from earlier assignments about using exceptions for error handling is still in effect, as is the required format for your error messages:

error: some helpful description

If you followed the advice for error handling on previous assignments, you should have little trouble handling the new errors. Except of course runtime errors, those need to be processed in assembly now! But you should still follow the format from before.

Graduate Level Requirements

If you are taking this course at the graduate level, the new errors that are possible with this assignment should produce accurate position information.

Furthermore we expect graduate students to attempt at least 70% worth of features for this assignment. If you attempt less than that you may be penalized more severely.

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Regardless of your programming language of choice, we expect to build your project using make (if it needs building at all) and we expect to run your project using ./sc (which stands for “SIMPLE compiler”). You are free to use the standard library for your language of choice, except for modules/classes that allow you to avoid writing large parts of the code for an assignment; so no regular expressions, no parsing combinators, etc. Depending on your language of choice, compliance with certain tools (e.g. checkstyle or valgrind), compiler flags, or additional style guides may also be required; see Piazza for details.

Grading

For reference, here is a short explanation of the grading criteria; not all of the criteria apply to all problems on a given assignment, and not all of the assignments even use all of the criteria.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for.

Design refers to proper modularization (into functions, classes, modules, etc.) and the proper choice of algorithms and data structures.

Performance refers to how fast/with how little memory your project can produce the required results compared to other submissions; in this course this can mean your actual compiler or interpreter as well as the code generated by it.

Functionality refers to your programs being able to do what they should according to the specification given above. (It also refers to you simply doing the required work, which may not be programming alone.) If the specification is ambiguous, ask for clarification! If no clarification is forthcoming, defend the choices you have made in your README file.

If your project cannot be built, or if it is otherwise obvious that you never tested it, you will get no points whatsoever. If you project cannot be built without warnings using the required compiler options we will take off 10%. If your programs cannot be built using make we will take off 10%. If valgrind detects memory errors in your programs, we will take off 10%. If your project fails miserably even once, i.e. terminates with an exception of any kind or dumps core, we will take off 10%. Presumably you see the pattern here?