600.328 / 428 / 628: Compilers and Interpreters
You’re in the right place if you want to find out how compilers and
interpreters, the tools you’ve been using for quite a while now to do
all your programming with, really work. You’ll also pick up some useful
software development techniques along the way.
Catalog Description: Introduction to compiler design, including
lexical analysis, parsing, syntax-directed translation, symbol tables,
run-time environments, and code generation and optimization. Students
are required to write a complete compiler as a course project.
Prerequisite(s): 600.120: Intermediate Programming, 600.226: Data
Structures. (Having taken 600.233: Computer System Fundamentals and
600.271: Automata and Computation Theory is very helpful too.)
The course includes significant programming projects; without prior
development experience you’ll probably get lost in a
maze of relatively complex code.
Please read the
general course policies
and take them to heart.
Additional policies specific to this course may be posted at a later date.
- Understand the theoretical foundations for compilers
- Practice object-oriented design and the application of
- Implement a full compiler for a high-level imperative language.
- Understand the hardware / software tradeoffs involved in
There is no required text.
However, it is strongly recommended that you get yourself a text book
The following are all excellent, but none covers exactly what we’ll do in
the course, and all cover things that we’ll never even mention.
Note that most editions will do, you don’t necessarily need the latest and
therefore most expensive one.
Compilers and Interpreters
Software Development and Design
- Assignments (about 10): 60%
- Midterm: 15%
- Final: 25%
Please check the individual assignments for due dates and the structure your
solutions should have.
See the course policies
for detailed submission instructions.
If you have an opinion on these assignments, be it good or bad, please
know about it. We’re always trying to make these things more enjoyable
(if that’s an applicable term? :-).
This is not a schedule.
It’s a “log” of what we did, roughly, in each lecture.
Don’t expect it to turn into a schedule, it won’t.
Also there will eventually be gaps, sorry.
- January 29: Welcome,
- January 31: A tiny interpreter / compiler for integer expressions;
demo and overview;
vector and string abstractions to make C more palatable;
details on tokens, scanner, abstract syntax tree (nodes), parser;
outline of grammars (abstract grammar for AST, concrete grammar
- February 2: A tiny interpreter / compiler for integer expressions;
grammars again, abstract and concrete;
details on parser, interpreter, code generator for MIPS/SPIM.
- February 5: Compiler qualities;
Compiler architecture, phases, passes;
bootstrapping and porting (T-diagrams, self-compilation test,
- February 7: Formal languages review;
notations for grammars (EBNF in particular);
intro to lexical analysis;
regular languages and grammars;
- February 9: Scanner implementation; working from the EBNF;
token representations; interface between scanner and parser.
- February 12: Intro to syntactic analysis;
context-free languages and grammars;
top-down versus bottom-up parsing;
left recursion elimination for top-down parsing;
recursive descent parsing;
- February 14: Parser implementation;
working from the EBNF;
intro to error handling;
panic mode: stop at the first error;
synchronizing parser and input for recovery;
“It’s full of heuristics!”;
first and follow sets;
why error handling based on follow sets is complicated;
using weak and strong tokens/symbols;
missing weak symbols don’t require synchronization
(neither do some common “wrong use” errors like “=” versus “:=“);
after a non-weak error, resynchronize at a strong symbol;
filtering excessive error messages.
- February 16: Revised error handling for graduate students;
no more exceptions; weak symbols; our four synchronization points;
general synchronization pattern.
- February 19: Visualizing the parse tree (without actually building it);
the observer pattern (aka publish-subscribe aka events);
intro to semantic analysis;
attribute grammars (synthesized and inherited attributes);
semantic analysis code in the parser;
context established by declarations;
mapping names to meanings;
block structure (nested scopes, shadowing).
- February 21: Scope objects (outer scopes, hierarchical lookup); entry objects
(constants, types, variables); long-ish symbol table example; universe scope;
- February 23: (Marc Rosen) Introduction to undefined / unspecified
behavior, why it can increase performance; introduction to SSA form (control
flow graphs, PHI nodes, dead variable elimination, global value numbering).
- February 26: Details on
RECORD types; long-ish symbol table
example; the issue of “anonymous” types for variables and record fields.
- February 28: Lost in time, sorry;
probably a bigger, more complex symbol table example;
probably the beginnings of the visitor pattern.
- March 2: Details on visitor pattern for traversing hierarchical data;
introduction to abstract grammars,
abstract syntax trees (ASTs),
possible abstraction levels for ASTs,
AST transformations (eg.
- March 5: Simple AST example (
type information in the AST,
design of class hierarchy for AST.
- March 7: lost, sorry
- March 9: Introduction to the direct interpreter;
next-guided post-order traversal of AST;
using a stack to handle numbers, binary expressions,
building the enviroment to map variables to their current values.
- March 12: More on the direct interpreter;
review stack for intermediate / temporary values;
review environment / boxes to track current value of variables;
handling locations in expression contexts (dereferencing);
resolving complex designators against the environment;
handling conditions, using integers as “fake” booleans;
- March 14: Getting ARM accounts;
learning (any) assembly language using
gcc (and a manual or three);
mostly using x86 examples; a tiny bit of PowerPC.
- March 16: Midterm Exam.
- March 26: Code generation overview;
storage allocation, instruction selection, mutual constraints;
storage allocation (for variable) from the symbol table
(using a fixed register for globals);
note on constants in ARM and the need for storing some.
- March 28: Reminder about ARM accounts;
brief examples of ARM assembly;
instruction selection process;
tiling the AST with “code patterns” for the chosen CPU/storage layout;
using “one-node” tiles for now;
implications for connecting code patterns, using a stack again;
simple example of a “constant” expression.
- March 30: Code generation for
tricky immediate values and the
(leading to a discussion of the constant aka literal pool, see
dereferencing addresses into values;
code generation for
Binary expression nodes;
code generation for
movne for generating boolean values without branches.
- April 2: Code generation for
examples of control flow graphs;
how to generate / use labels for control flow / branches;
brief discussion of
WHILE and why it’s missing;
introduction to better code generation by (a) getting rid of the stack
for temporaries and (b) taking more context into account (larger tiles);
outline / example of “register stack” approach.
- missing a few, sorry (April 4, April 6); this was mostly about how to get
rid of the stack
- April 9: Sethi-Ullman numbering as an example of “preprocessing” the AST to
guide code-generation later (sorry, got a little confused here); register
allocation as “caching” or “paging” of (many) values into (few) registers;
LRU policy; outline of using Belady’s paging algorithm after computing the
“furthest use” in the AST; brief note on graph-coloring and why we don’t try;
outline of weighting “importance” of a value by loop-nesting level.
- April 11: Note on replacing multiplication and division by shifts; using
context information during code generation; idea of “items” summarizing
the result of visiting an AST node; idea of “delaying” code emission as
long as possible; (fake) example of
CONST items (value); example of simple
VAR item (offset from global base register); example of
+ node and using
items to “delay” loading/dereferencing a variable (using the
addressing mode); need for a
REGISTER item to track run-time values;
FIELD item (offset from some address); example of nested
records/fields emitting no code whatsoever.