Compilation by Program Transformation

Goal: understand some of the concepts behind compilation by writing a high-level transformational compiler. Our compiler: Real compilers: Some points about the approach we will take.

Desired Soundness Property for program translations: programs before and after translation have the same execution behavior (in our case, termination and same numerical output, but in general the same i/o behavior).

The Transformations

The DSR transformations are now covered in the order they are applied to the source program.

Closure Conversion

Consider the example of a curried addition function.
add = Function x -> Function y -> x + y
In the body x + y, x is nonlocal and y is local.

What should add 3 return?

Answer: return a closure, consisting of the function and an environment which remembers the values of the nonlocal variables for later use:

(Function y -> x + y, { x |-> 3 })
Some more structure than this is needed in order for the function to get its x value when invoked; we cover that next.

Closure conversion is a global program transformation that explicitly performs this operation in the language itself. Core ideas

The translation is introduced by way of example. add above translates to
add' = { fn = Function xx ->
   { fn = Function yy -> (yy.envt.x) + (yy.arg); envt = { x = xx.arg } };
      envt = {} }
Whew! This is a pretty complicated operation. Some comments. Translation of function calls

Function call add 3 after closure conversion then must pass in the environment since the caller needs to know it.

(add'.fn)({ envt = add'.envt; arg = 3})
Translation of add 3 4 takes the result of the above, which is a function closure { fn = ...; envt = ... }, and does the same trick to apply 4 to it:
Let add3' = (add'.fn)({ envt = add'.envt; arg = 3}) In
  (add3'.fn)({ envt =  add3'.envt; arg = 4})
and the result would be 12, the same as the original result, confirming the soundness of the translation in this case.

In general,

One more level of nesting

Well, its even slightly more complicated if we consider one more level of nesting of functions, for example

triadd = Function x -> Function y -> Function z -> x + y + z
--the z function needs to get the x value, and since that function is defined in the Function y, that function has to be an intermediary which passes x.

Here is the translation.

triadd' =
  { fn = Function xx ->
    { fn = Function yy -> 
      { fn = Function zz -> (zz.envt.x) + (zz.envt.x)  + (zz.arg);
        envt = { x = yy.envt.x; y = yy.arg } };
      envt = { x = xx.arg } };
    envt = {} }
Some observations about this example

The official translation

With that example in mind, we can write out the official closure conversion translation. We will use the notation clconv(e) to express the closure conversion function, defined inductively as follows (this code is informal, it uses concrete DSR syntax which in the case of e.g. records looks like Caml syntax).

  1. clconv(x) = x (* variables *)
  2. clconv(n) = n (* numbers *)
  3. clconv(b) = b (* booleans *)
  4. clconv(Function x -> e) = letting x, x1, ..., xn be precisely the free variables in e, the result is the DSR expression
          { fn = Function xx -> SUB[clconv(e)], 
            envt = { x1 = x1; ...; xn = xn } } 
    where SUB[clconv(e)] is clconv(e) with substitutions (xx.envt.x1)/x1,...,(xx.envt.xn)/xn and (xx.arg)/x performed on it, but not substituting in Function's inside clconv(e) (stop substituting when you hit a Function).
  5. clconv(e e') = Let f = clconv(e) In (f.fn){ envt = f.envt; arg = clconv(e')}
  6. clconv(e op e') = clconv(e) op clconv(e') for all other operators in the language (the translation is homormorphic in all of the other operators). This is pretty clear in every case except maybe records which we will give just to be sure...
  7. clconv( { l1 = e1; ...; ln = en } ) = { l1 = clconv(e1);...; ln = clconv(en) }
From the above example, clconv(add) = add'.

The desired soundness result is
Theorem: e computes to a value if and only if clconv(e) computes to a value. Additionally, if one returns numerical value n, the other returns the same numerical value n.

This operation results in a language that has no nonlocal variables in functions, more like the C/C++ languages. We are getting closer to machine code.

The A-translation

The idea should be self-evident from the case of arithmetic expressions. Consider for instance
4 + (2 * (3 + 2))
Our interpreter defined a tree-notion of evaluation order on such expressions. The order in which evaluation happens on this program can be made explicitly linear by using Let to factor out the parts that are evaluated first:
Let v1 = 3 + 2 In
Let v2 = 2 * v1 In
Let v3 = 4 + v2 In
 v3

We are in fact going to define a more simple translation, that also first assigns constants and variables to other variables:

Let v1 = 4 In
Let v2 = 2 In
Let v3 = 3 In
Let v4 = 2 In
Let v5 = v3 + v4 In
Let v6 = v2 * v5 In
Let v7 = v1 + v6 In
 v7

Some points Next consider some code with higher-order functions.
((Function x -> Function y -> y)(4))(2)
the function that 2 is being applied to first needs to be computed. We can make this explicit as well:
Let v1 = (Function x -> Function y -> y)(4) In
Let v2 = v1(2) In
 x

The A-translation given below does even more linearization on this example:
Let v1 =
  (Function x ->
     Let v1' = (Function y -> Let v1'' = y in v1'') In v1') In
Let v2 = 4 In
Let v3 = v1 v2 In
Let v4 = 2 In
Let v5 = v3 v4 In
 v5

Every other evaluation construct can be linearized in this fashion. Except If:
If (3 = x + 2) Then 3 Else 2 * x
can be turned into (approximately)
Let v1 = x + 2 In
Let v2 = (3 = v1) In
If v2 Then 3 Else Let v1 = 2 * x In v1 
but the If still has a branch in it.
However we can implement this simple form of If in machine code as
v1 := x + 2
v2 := 3 = v1
BRANCH v2, L2
L1: v3 := 3
GOTO L3
L2: v4 := 4
L3:
So, this form is quite close to machine code.

We will give the A-translation the core DSR syntax.
The intermediate result of the translation is a list of tuples

[(v1,e1); ...; (vn,en)] : (ide * term) list
which is intended to represent
Let v1 = e1 In .. In Let vn = en In vn ...
but is a form easier to manipulate in Caml since lists of declarations will be appended together at translation time. In your compilers, you may or may not want to use this intermediate form, it is not much harder to write the functions to work directly on the Let representation.

The Official A-translation

We define the A-translation as a Caml function, atrans(e) : term -> term. We will always apply A-translation to the result of closure conversion, but we really don't need to be aware of that now. We now sketch the translation for the core primitives.

We assume auxiliary functions:

let atrans e = letize (atrans0 e)

and atrans0(e) = match e with
    (Var x)  -> [(newid(),Var x)] |
    (Int n)  -> [(newid(),Int n)] |
    (Bool b) -> [(newid(),Bool b)] |
    Function(x,e) -> [(newid(),Function(x,atrans e)] |
    Appl(e,e') -> let a = atrans0 e in let a' = atrans0 e' in
                           a @ a' @ [(newid(),Appl(resultId a,resultId a')] |
    ...
   (* all other D binary operators + - = AND etc of form identical to Appl *)
    ...
    If(e1,e2,e3) -> let a1 = atrans0 e1 in
               a1 @ [(newid();If(resultId a1,atrans e2,atrans e3)] |
    ...
At the end of the A-translation, the code is all "linear" in the way it runs in the interpreter, not as a tree.
Machine code is also linearly ordered; we are getting much closer to machine code.

Theorem: The A translation is sound, i.e. e and atrans(e) both either compute to values or both diverge.

The A-Translation for the full DSR language

The extra syntax of DSR (records, reference cells) does not provide any major complication for the A-translation.

Function hoisting

So far, the compiler has performed closure conversion and A-translation in turn:
let intermedresult e = atrans(clconv(e))
Informally, the operation is quite simple: take e.g.
4 + (Function x -> x + 1)(4)
and replace it by
Let f1 = Function x -> x + 1 In 4 + f1(4)
--in general, hoist all functions to the front of the code and give them a name via Let.
The transformation is always sound if there are no free variables in the function body, a property guaranteed by closure conversion.

We will define this process in a simple iterative (but inefficient) manner:

let hoist e =
if e = e1[(Function ea -> e')/f] for some e1 with f free in it, and e' contains no functions (i.e. Function ea -> e' is an innermost function)
then Let f = (Function ea -> e') In hoist(e1)
else e.
Resulting programs will be of the form
Let f1 = Function ea -> e1 In
       ...
    fn = Function ea -> en In
  e

Where each e1,...,en,e contain no function constants.

Theorem: e computes to a value if and only if hoist(e) computes to a value.

This Theorem is easily proved from iterative application of the following Lemma

Lemma: (e1[(Function ea -> e')/f]) ~= (Let f = (Function ea -> e') In e1)

We lastly transform the program to

Let f1 = Function x1ea -> e1 In
       ...
    fn = Function -> Function xnea -> en In
    main = Function dummy -> e In
    main(anything)

So, the program is officially nothing but a collection of functions. This brings the program closer to the form of a C program.

Final C translation

To summarize up to now, we have
let frontend e =  hoist(atrans(clconv(e)))
The translation outline:

The atomic tuples we have now

Before giving the translation, we enumerate all possible right-hand sides of Let variable assignments that come out of the A-translation (in the following vi, vj, vk, f are variables). Fact: DSR programs that have passed through the first three phases should have function bodies consisting of tuple lists where each tuple is of one of the following forms only:
  1. x for variable x
  2. n for number n
  3. b for boolean b
  4. vi vj (application)
  5. vj + vk
  6. vj - vk
  7. vj And vk
  8. vj Or vk
  9. Not vj
  10. vj = vk
  11. Ref vj
  12. vj := vk
  13. !vj
  14. { l1 = v1; ... ; ln = vn }
  15. vi.l
  16. If vi Then tuples1 Else tuples2
    where tuples1 and tuples2 are the lists of variable assigments for the Then and Else bodies.
All we need to do now is generate code for each of the above tuples.

Memory Layout

Before writing any compiler, you should always design the memory layout scheme for objects at run-time. Lets consider briefly how memory is laid out in C.
Values can be stored in a couple different ways: An important issue is whether to box or unbox various values.

Definition. A register/memory location vi's value is stored boxed if vi holds a pointer to a block of memory containing the actual value.
A variable's value is unboxed if it is directly in the register/memory location vi.

For multi-word entities, storing them unboxed means variables directly hold a pointer to the first word of the sequence of space.

Here is C's memory layout convention:

Here is an example of a stupid C program and the SPARC assembly output which gives some impressionistic idea of these concepts:
int glob;
main()
{
	int x;
        register int reg;
	int* mall;
	int arr[10];

	x = glob + 1;
	reg = x;
	mall = (int *) malloc(1);
	x = *mall;
	arr[2] = 4;
/*	arr = arr2; -- illegal in C -- arrays not boxed so can't do this */
}
Assembly (%o1 is a register, [%o0] means dereference, [%fp-24] means subtract 24 from frame pointer register %fpand dereference)
main:
	sethi	%hi(glob), %o1
	or	%o1, %lo(glob), %o0 /* load global address glob into %o0 */
	ld	[%o0], %o1  /* dereference */
	add	%o1, 1, %o0 /* increment */
	st	%o0, [%fp-20] /* store in [%fp-20], the memory 20 back from fp -- this is x */
                              /* note x directly contains a number, not a ptr */
	ld	[%fp-20], %l0 /* %l0 IS reg (its in a register directly) */
	mov	1, %o0
	call	malloc, 0 /* call malloc.  resulting address to %o0 */
	 nop
	st	%o0, [%fp-24] /* put newspace location in mall ([%fp-24]) */
	ld	[%fp-24], %o0 /* load mall into %o0 */
	ld	[%o0], %o1 /* this is a malloced structure -- UNBOX! */
	st	%o1, [%fp-20] /* store into x */
	mov	4, %o0
	st	%o0, [%fp-56] /* array is directly a sequence of memory on stack - no indirection needed */
.LL2:
	ret
	restore
Memory layout for our compilers

Aside: Java memory layout

Observations:

All that remains is to come up with a scheme to compile each of the above atomic tuples and we are done. Records are the most difficult so we will consider them before writing out the full translation.

Compiling untyped records

Aside: This brings out some important properties of typing and compilation Our Records Compilation

The above is less than optimal because

For instance,

(Function x -> x.l)(If y = 0 Then {l = 3} Else {a = 4; l = 3})
the code x.l will invoke a call of approximate form hashlookup(x,"l"). {a = 4; l = 3} will create a new hash table and add mappings of "a" to 4 and "l" to 3.

The translation

We are now ready to write the final translation to C, via functions

The translation as informally written below takes a few liberties for simplicity.

    toCTuple(vi = x) =           "vi = x;" (* x is a DSR variable *)
    toCTuple(vi = n) =           "vi = n;"
    toCTuple(vi = b) =           "vi = b;"
    toCTuple(vi = vj + vk) =     "vi = vj + vk;"
    toCTuple(vi = vj - vk) =     "vi = vj - vk;"
    toCTuple(vi = vj And vk ) =  "vi = vj && vk;"
    toCTuple(vi = vj Or vk ) =   "vi = vj || vk;"
    toCTuple(vi = Not vj ) =     "vi = !vj;"
    toCTuple(vi = vj = vk) =     "vi = (vj == vk);"
    toCTuple(vi = (vj vk) =      "vi = *vj(vk);"
    toCTuple(vi = Ref vj) =      "vi = malloc(WORDSIZE); *vi = vj;"
    toCTuple(vi = vj := vk) =    "vi = *vj = vk;"
    toCTuple(vi = !vj) =         "vi = *vj;"
    toCTuple(vi = { l1 = v1; ... ; ln = vn }) =
             /* 1. malloc a new hashtable at vi
                2. add mappings l1 -> v1 , ... , ln -> vn  */

    toCTuple(vi = vj.l) =        "vi = hashlookup(vj,"l");"
    toCTuple(vi = If vj Then tuples1 Else tuples2) =

              "if (vj) { toCTuples(tuples1) } else { toCTuples(tuples2) };"

    toCtuples([]) = ""
    toCtuples(tuple::tuples) = toCtuple(tuple) ^ toCtuples(tuples)

    toCFunction(f = Function xx -> tuples) =
                  "Word f(Word xx) {" ^ ... declare temporaries ...
                     toCtuples(tuples) ^
                     "return(resultId tuples); };"

    toCFunctions([]) = ""
    toCFunctions(Functiontuple::Functiontuples) = toCFunction(Functiontuple) ^ toCFunctions(Functiontuples)

    toC then invokes toCFunctions on its list of functions.
Question: why is a fresh memory location malloc'ed for a Ref?? This is a subtle issue, but the code vi = &vj would definately not work for the Ref case.

This translation sketch above leaves out many details. Here is some elaboration.

For typing

Some global issues you will need to deal with Other subtle points.

Compilation to Assembly code

This C code is very close to assembly code. It would be conceptually easy to translate into assembly, but we skip the topic due to the large number of cases that arise in the process (saving registers, allocating space on the stack for temporaries.

Summary

let frontend e =  hoist(atrans(clconv(e)));;
let translator e = toC(frontend(e));;
We can assert the correctness of our translator. Assert: DSR program e terminates in the DSR operational semantics (or evaluator) just when the C program translator(e) terminates, provided the C program does not run out of memory. Core dump or other run-time errors are equated with nontermination. Furthermore, if DSR's eval(e) returns a number n, the compiled translator(e) will also produce numerical output n.

Optimization

Optimization can be done at all phases of the translation process. The above translation is embarrasingly inefficient. In the phases before C code is produced, optimizations consist of replacing chunks of the program with operationally equivalent chunks.

Some simple optimizations include

More fancy optimizations require a global flow analysis be performed. Simply put, a flow analysis finds all possible uses of a particular definition, and all possible definitions corresponding to a particular use.
A definition is a record, a Function, or a number or boolean, and a use is a record field selection, function application, or numerical or boolean operator.

Garbage Collection

Our compiled code malloc's but never frees. We will eventually run out of memory. A garbage collector is needed.

Definition: In a run-time image, memory location n is garbage if it never will be read or written to again.

There are many notions of garbage detection. The most common is to be somewhat more conservative and take garbage to be memory locations which are not pointed to by any known ("root") object.

Last modified: Fri Apr 5 14:25:20 EST 2002