Assignment 5: Interpreter

Overview

In the fifth part of the compiler project you will extend your existing compiler frontend (which builds the checked intermediate representation for SIMPLE programs in the form of the symbol table and the abstract syntax tree) with a backend that actually runs SIMPLE programs. Your interpreter first traverses the ST to build an environment which tracks the run-time value of all variables in a SIMPLE program; it then performs a post-order-style traversal of the AST and (using the environment as well as an auxiliary stack) executes the program one AST node at a time.

The interpreter also enforces the last few context conditions that could not be enforced during (static) semantic analysis because they depend on the actual run-time values computed by a program. You can get the abstract grammar for the SIMPLE programming language here. The context conditions for this assignment are given here.

Problem 1: Driver Program (10%)

The final compiler will consist of a number of modules and classes working together to translate programs written in SIMPLE into equivalent programs written in assembly language. While these “bits and pieces” are spread out over the entire semester, you can already implement the basic driver program that will orchestrate their work. The driver will be called sc and is invoked from the shell as follows:

Invocation = "./sc" ["-" ("s"|"c"|"t"|"a"|"i")] [filename] .

This describes the syntax of the command line in EBNF. After ./sc itself, the user can supply one option (introduced by “-”) to tell the driver which parts of the compiler to run and what kind of output to produce.

Arguments and Options

With this assignment the option -i is allowed on the command line for sc. The remaining options, including “no options at all,” should still result in errors, except for -s, -c, -t, and -a which are unchanged from previous assignments.

For this assignment, the option -i is supposed to traverse both the symbol table and the abstract syntax tree to actually run a SIMPLE program. This includes performing input and output (on standard input and standard output) whenever the program executes a READ or WRITE instruction; no other input or output should be required or performed, except for errors. If -i is given and an error is detected before the AST is completely built, the interpreter should not run; if an error is detected while the interpreter is running, the program should stop with an error message at that time.

If a second argument is given, it is assumed to be the file name of a SIMPLE program to process. If no filename is given, you should read the program from standard input instead. Eventually this will also determine whether the output goes to standard output or to a file, but for now all your output goes to standard output.

Problem 2: Interpreter (90%)

The interpreter needs to traverse both the symbol table (ST) and the abstract syntax tree (AST), and you should apply the visitor design pattern for these tasks once more. Except for the brief notes below, you’re pretty much on your own for this assignment…

Environments

Interpreting a SIMPLE program requires that we “keep track” of the current values of all variables that were declared. This includes the elements of arrays as well as the fields of records. We need to “allocate” the necessary “storage” (which you can think of as “boxes” or something) for these variables, and we do so in a data structure called the environment.

Environments map names to storage in a way similar to the ST which maps names to meanings. However, environments do not include constants or types, both of which can occur in the ST: since their “values” cannot change during execution, we do not have to “keep track” of them.

Boxes

First you need to decide how you will represent the storage (or “boxes”) necessary. Since environments are yet another example of a data structure where we have to handle various “kinds” of entries, the use of inheritance is appropriate once more.

You should write an abstract base class Box and three derived classes: IntegerBox to hold a single integer value, ArrayBox to hold the boxes that make up the value of an array, and RecordBox to hold the boxes that make up the value of a record.

For IntegerBox you will need operations to get and set the current value; you should initialize the value to zero when an IntegerBox is created. For ArrayBox and RecordBox you will need operations to access one of the boxes they are “made up of,” as well as operations that assign (in the sense of a “deep copy”) one ArrayBox or RecordBox to another (to support assignments between complete arrays or records as allowed in SIMPLE).

An element inside an ArrayBox is obviously selected using an integer index; please make sure to check if the index is actually valid! For the fields of a RecordBox, however, the way to select a particular field is not obvious at all. One idea is to use the name of the field, however that does not fit the AST we built for accessing record fields: we don’t store names but pointers to Variable nodes in the AST and Variable objects in the ST in turn. Resolving this “mismatch” is up to you…

Building the Environment

To build the environment you have to traverse the ST, create instances of the various Box classes, and connect them as appropriate. One possibility is to add an operation to the Scope class that returns an environment for its contents. Another possibility is to implement the operation outside any class as a function, but to make it a friend of the relevant classes. You could also develop a separate visitor that traverses the ST and builds the environment. The details are up to you again…

Interpreter

Once you have created the environment for a SIMPLE program, you have to actually run it. (Note that in SIMPLE the BEGIN part of a program is optional; be sure to handle this quirk!) The process of interpreting the AST proceeds as a (mostly) post-order, (mostly) left-to-right traversal, using an auxiliary stack to hold intermediate results. Again you have a choice of using the visitor pattern for this traversal, or of implementing it in the form of recursive functions that operate on (parts of) the AST; we did the latter in the tiny example compiler I showed you at the beginning of the semester.

Remember that you have to traverse a sequence of instructions in the order they appear in the program. For each AST node you encounter during the traversal you have to perform the appropriate actions. Numerous examples of this process were given in the lecture, but here are a few reminders:

When you reach the end of an instruction, the stack should be empty once again (a nice “sanity check” for your code).

Error Handling

The advice from earlier assignments about using exceptions for error handling is still in effect, as is the required format for your error messages:

error: some helpful description

Enforcing context conditions for SIMPLE programs will lead to a number of “new” errors, for example when the index for an array access is out of range. If you followed the advice for error handling on previous assignments, you should have little trouble handling those new errors.

Input and Output

Input and output are almost trivial for this assignment. When you interpret a READ instruction, you should read a single integer value (followed by a newline) from standard input and store it in the appropriate Box. When you interpret a WRITE instruction, you should write a single integer value to standard output (followed by a newline). Consider the following SIMPLE program for example:

PROGRAM X;
  CONST ff = 42;
BEGIN
  WRITE ff+5
END X.

If this program is stored in a file “47.sim” then running the program should work as follows:

$ ./sc -i 47.sim
47
$

Now consider the following program which simply “echos” the number entered by the user:

PROGRAM X;
  VAR x: INTEGER;
BEGIN
  READ x;
  WRITE x
END X.

Running the program should work as follows:

$ ./sc -i echo.sim
16
16
$ ./sc -i echo.sim
64738
64738
$

Here the first number was typed by the user whereas the second number is output by the interpreter. I hope these examples suffice…

Graduate Level Requirements

If you are taking this course at the graduate level, the new run-time errors that are possible with this assignment should produce accurate position information just like your compile-time errors do. However, you do not keep running the program after a run-time error has occurred, that would just be silly.

Deliverables

Please follow the submission instructions as detailed on Piazza. Make sure that your tarball contains no derived files whatsoever (i.e. no executable files), but allows building all required derived files. Also make sure to include your name and email address in every file you turn in (well, in every file for which it makes sense to do so anyway)!

Regardless of your programming language of choice, we expect to build your project using make (if it needs building at all) and we expect to run your project using ./sc (which stands for “SIMPLE compiler”). You are free to use the standard library for your language of choice, except for modules/classes that allow you to avoid writing large parts of the code for an assignment; so no regular expressions, no parsing combinators, etc. Depending on your language of choice, compliance with certain tools (e.g. checkstyle or valgrind), compiler flags, or additional style guides may also be required; see Piazza for details.

Grading

For reference, here is a short explanation of the grading criteria; not all of the criteria apply to all problems on a given assignment, and not all of the assignments even use all of the criteria.

Packaging refers to the proper organization of the stuff you hand in, following both the guidelines for Deliverables above as well as the general submission instructions for assignments on Piazza.

Style refers to programming style, including things like consistent indentation, appropriate identifier names, useful comments, suitable documentation, etc. Simple, clean, readable code is what you should be aiming for.

Design refers to proper modularization (into functions, classes, modules, etc.) and the proper choice of algorithms and data structures.

Performance refers to how fast/with how little memory your project can produce the required results compared to other submissions; in this course this can mean your actual compiler or interpreter as well as the code generated by it.

Functionality refers to your programs being able to do what they should according to the specification given above. (It also refers to you simply doing the required work, which may not be programming alone.) If the specification is ambiguous, ask for clarification! If no clarification is forthcoming, defend the choices you have made in your README file.

If your project cannot be built, or if it is otherwise obvious that you never tested it, you will get no points whatsoever. If you project cannot be built without warnings using the required compiler options we will take off 10%. If your programs cannot be built using make we will take off 10%. If valgrind detects memory errors in your programs, we will take off 10%. If your project fails miserably even once, i.e. terminates with an exception of any kind or dumps core, we will take off 10%. Presumably you see the pattern here?