Advanced Interpreters: Records and State

2-Tuples (Pairs)

(We are skipping this topic in lecture this year)

Pairing is the most fundamental form of data aggregation in programming. With pairing you build just about anything you want.

Earlier we showed pairs could be encoded in D, but those pairs will not be very efficient, and they can be applied like functions so have some wrong behavior. To add "real" pairs, extend the D expr type by adding
... | Pr of expr * expr | Left of expr | Right of expr
Extend the eval function by adding clauses
 
eval Pr(expr1,expr2) = Pr(eval(expr1),eval(expr2))
eval Left(expr) = match eval(expr) with
                         Pr(expr1,expr2) -> expr1
                          ...
eval Right(expr) = ...
This is an "eager" pair, the components of the pair are evaluated. Caml 2-tuples are eager, (2,3+4) evalues to (2,7).

Question: if we wanted any (e,e') to be considered a value immediately, how would our evaluator be written?

The space of values is now bigger: The values are now either

where v and v' are themselves values (a recursive definition).

Recall that 3-tuples can be encoded via two-tuples Pr(e,Pr(e',e'')), and similarly for 4-tuples etc.

Operational semantics rules for tuples: an exercise.

Records

Records are a variation on tuples where the fields have names.

What advantages do records have over tuples?

But, if records are always statically (at the time we write our code) known to be of fixed size, then we may as well map the labels to numbers and make a tuple:
Record {x = 5; y = 7; z = 6} maps to tuple (5,(7,6))
.x maps to Left, .y maps to Function x -> Left(Right(x)), .z maps to Function x -> Right(Right(x)).
Obviously this makes ugly, hard-to-read code, but it works. For C-style structs, this encoding would work.

But, in the case where records can grow or shrink, this encoding is fundamentally too weak. C++ structs can be subtypes of one another, so some fields that are not declared may in fact be present at run-time.

Recall Caml records are of the form

{size = 7; weight = 245.3; name = "Buzz"}
They can have any number of fields. Values are selected by syntax record.size.

We will use the same syntax in our D language extension, which we will name DR.

Record or subtype polymorphism


Operational Semantics for Records

Exercise. We are going to concentrate on the interpreter here.

The DR Datatype in Caml

Record labels are symbols. we could use our identifiers (Ide "size") as labels, but it is better to think of record labels as a different sort. For instance, labels are never bound or substituted for. So, make a new type
type label = Lab of string
Records may be of arbitrary length, so a Caml list of label, expr pairs must be used to define record syntax. The DR expr type is
type expr = ...
| Record of (label * expr) list | Select of   expr * label
The concrete syntax
{size = 7; weight = 245}
is then encoded as abstract syntax within Caml as
Record [(Lab "size", (Int 7)); (Lab "weight", (Int 245))]
and e.size as Select(e,Lab"size").

The definition of values is extended from the values of D:
Records {field1 = v1; ..; fieldn = vn} are values provided v1 through vn are values.

Finally, we extend the D interpreter to a DR interpreter.

let rec eval e = match e with
...
| Record(body) -> Record(evalRecord(body))
| Select(exp,lab) -> match eval(exp) with
                         Record(fieldList) -> lookupRecord(fieldList,lab)
                          ...
...

and evalRecord l = match l with
  [] -> []
| (Lab l,exp)::xs = (Lab l,eval(exp))::evalRecord(xs)

and lookupRecord (record,Lab s) = match record with
 [] -> raise FieldNotFound 
 | (Lab s1,v)::xs -> if s1 = s then v else
                  lookupRecord(xs,Lab s)
Is {} or Record [], the empty record, OK?
Yes, according to the above evaluator; it computes to itself and is also a value by the definition of value.
Just don't try to select any fields from it!

State-Based Languages

Now we exit the world of pure functional programming and start considering side effects.

We will study DS, a language obtained by adding Ref e (reference creation), e := e' (set), and !e (get) syntactic operations to D.

State is our first example of a side effect in programming: the effect of assigning is not local since distant parts of the program may have the same cell and thus see the change.

Other side effects:

Languages without side effects, like D and DR are pure functional languages. Once we add any effect they are not pure functional any more.

An example of the nonlocal nature of side effects.

let x  = ref 9 in
        let f z = x := !x + z in
           x:= 5; f(5); !x
     ;;
    - : int = 10
Since side effects are not local, they can make programs a lot more difficult to understand.
Programming Moral:
Be spare in your use of side effects
Reference cell side effects:

Operational Semantics for DS

Since memory is a significant modification to the language we go back to operational semantics before discussing the interpreter.
Definition. A store S maps cell names (denoted by the letter c) to their values.
Definition. The (concrete) DS syntax extends the D syntax by adding Ref e, e := e', !e, and cell names c.

We write

S { c |-> v }
to indicate the store S modified/extended so cell c maps to value v.
S(c) is the value of cell c in store S.
Evaluation for DS is written
< e,S0 > ==> < v,S >
where at the start of the computation S0 is an initial (empty) store and S is the final store when the computation terminates.

In the process of evaluation, cells c will begin to appear in the program syntax, as references to memory locations.
Cells are values since they do not need to be evaluated, so the space of DS values also includes cells c.

Evaluation Rules for DS

The different evaluation rules are modified with the store in mind.

The store is threaded along the flow of control.

There becomes more dependency between the rules, even the ones that don't directly manipulate the store. From the function application rule you should get an idea of the change needed to the other rules.

...
Function application rule
<e1, S1 > ==> <Function x -> e, S2 >, <e2,S2 > ==> <v2, S3 >, <e [v2/ x ], S3 > ==> <v, S4 >
------------------------------------------------------
<e1 e2, S1 > ==> <v, S4 >

Note how the store here is threaded through the different evaluations, showing how changes in the store in one place propagate to the store in other places, and in a fixed order that reflects the intended evaluation order.

Rules for the memory operations are as follows.

...
Ref e rule
<e, S1 > ==> <v, S2 >
------------------------------------------------------
<Ref e, S1 > ==> <c, S2 { c |-> v } > for c not in Dom(S2), i.e. a new cell name

!e rule
<e, S1 > ==> <c, S2 >
------------------------------------------------------
<!e, S1 ==> <v, S2 > where S2(c) = v

e := e' rule
<e1, S1 > ==> <c, S2 >, <e2, S2 > ==> <v, S3 >
------------------------------------------------------
<e1:= e2, S1 ==> <v, S3 { c |-> v } >

Here are some examples of execution with state to ponder. Note these work identically in Caml. We write out in more detail the evaluation of the second example.

We show <(Function y -> If !y = 0 Then y Else 0)(Ref 7),empty > ==> <0, {c |-> 7} >.
This matches the conclusion of the function application rule, provided we show three things:

  1. <(Function y -> If !y = 0 Then y Else 0),empty > ==> <(Function y -> If !y = 0 Then y Else 0),empty >
  2. <Ref 7,empty > ==> <c, {c |-> 7} >
  3. <(If !y = 0 Then y Else 0)[c/y], {c |-> 7} > ==> <0, {c |-> 7} >
The first follows by the value rule (values evaluate to themselves and do not change the store; hereafter we will not show uses of the value rule);
the second follows by the Ref rule above; lets work further on the third.

<If !c = 0 Then c Else 0, {c |-> 7} > ==> <0, {c |-> 7} > because by the If rule,
<!c = 0, {c |-> 7} > ==> <False, {c |-> 7} >,
which follows in turn by the = rule because
<!c, {c |-> 7} > ==> <7, {c |-> 7} >.

A Caml Interpreter for DS

(We skipped this topic in lecture)

The operational semantics clearly defines the meaning of DS programs, but we would also like to briefly consider how the interpreter may be implemented in Caml. Here is the abstract syntax.

type ident = Ident of string

type expr = 
 Var of ident | Function of ident * expr | Appl of expr * expr |
 Letrec of ident * ident * expr |
 Plus of expr * expr | Minus of expr * expr | Equal of expr * expr | 
 And of expr * expr| Or of expr * expr | Not of expr |  
 If of expr * expr * expr | Int of int | Bool of bool 
 Ref of expr | Set of expr * expr | Get of expr | Cell of int

We have two choices in writing an interpreter for DS.

Second will be a lot more efficient so we take that route here.
Desired theorem: the two approaches produce the same result.

a Functional Interpreter

We use a finite mapping from integer keys to values to model the store.
We define a structure for the interpreter, with a skeleton like
(* declare all the expr, etc types globally (too hard to do it "right") *)

(* put the store functionality in a separate module.  *)

module type STORE = 
  sig
   (* ... *)
  end

(* the Store structure implements a (functional) store.  A simple
implementation could be via a list of pairs such as
[((Cell 2),(Int 4)); ((Cell 3),Plus((Int 5),(Int 4))); ... ]
module Store : STORE =

type store = (* ... *)

 struct
  let empty = (* initial empty store *)
  let fresh = (* a simple object which returns a fresh Cell name *)
    let count = ref 0 in
    function () -> ( count := !count + 1; Cell(!count) )
  (* note: this is not purely functional!  its difficult to make fresh
     purely functional *)

(* look up value of cell c in store s *)
   let lookup (s,c) = (* ... *)

(* add or modify aCellName to aValue in store s, returning new store *)
  let  modify(s,c,v) =  (* ... *)
  end

(* evaluator is then a functor taking a store module *)

module DSEvalFunctor =
  functor (Store : STORE) ->
  struct
    
    (* ... *)


    let eval (e,s) = match e with 
      (Int n) -> ((Int n),s) (* values don't modify store *)
    | Plus(e,e') -> 
	let (Int n,s') = eval(e,s) in
	let (Int n',s'') = eval(e',s') in
	(Int (n+n'),s'')
	
(* other cases such as application are a similar store threading *)
	
    | Ref(e) -> let (v,s') = eval(e,s) in
                let c = Store.fresh() in
                   (c,Store.modify(s',c,v))
    | Get(e) -> let (Cell(n),s') = eval(e,s) in
      (Store.lookup(Cell(n)),s)
    | Set(e,e') ->  (* exercise *)

  end

module DSEval = DSEvalFunctor(Store)

Imperative Interpreters

The interpreter above was interpreting stateful langauge, but in a functional way. Sketch for imperative implementation of a DS interpreter:

Side Effecting Operators

Now that we have a mutable store, code has a property besides the value returned: it may have side effects. Syntax for sequencing, ";", while and for-loops, thus becomes relevant (Question: why was it previously irrelevant??).

These syntactic concepts are easily defined as macros, so we do not add them as official syntax:

e1 ; e2 = (Function x -> e2)(e1)
While e Do e' = (Let Rec f x = If e Then f(e') Else e)(0)
...

Cyclical Stores

It is also possible to make a cyclical store structure, where a cell's contents itself contains a pointer to itself.
Let x = Ref 0 In x := x
This is the simplest store cycle, a cell that points directly to itself.
Question: for the above cell x, what does !!!!!!!!!!! x return?

Question: Can such a form of a cycle be written in Caml?

A more subtle form of cycle is when a function is placed in a cell, and the body of that function refers to the cell.

Let c = Ref 0 In c := (Function x -> If x = 0 Then 0 Else 1 + !c(x-1)); !c(10)
--cell c contains a function which refers to the cell, and thus the function.
This is another way of implementing recursion, the method used in practice in most compilers: tying the knot.
Also often how objects are made self-aware. C++ however explicitly passes the self, which is like the Y combinator.

The "Normal" Kind of State

Languages you are more used to (C, C++, Java, Scheme, etc) have a different form for expressing mutation. Consider the Java/C/C++ assignment statement x = x + 1. l- and r-values are distinct. Here is a very simple revised grammar which restricts l-values to be directly a variable
type expr = 
...
Ref of expr | Set of var * expr | Get of expr 
For the variable on the left hand side of an assignment, we need the address and not the contents of the variable.

Uninitialized Variables

Another issue in the standard notion of state of C etc is (mutable) variables are not required to be initialized, so another run-time error of uninitialized value is possible.

Automatic Garbage Collection

Memory that is allocated also may at some point be freed. sketch of implementation

Environment-based Interpreters

(we didn't have time to cover this topic in lecture)

We are briefly going to touch on some efficiency issues in our interpreters as we have been defining them.

Goal: get rid of explicit substitutions. A "low level" interpreter would never be copying the function argument to each position in the function body. To compute

(Function x -> x x x)(whopping expr)
(whopping expr)(whopping expr)(whopping expr)
is computed, tripling the size of the data.

So, we define a more efficient explicit environment interpreter. This means rather than substitute for x, we don't substitute but keep track of what variables "really" are in a run-time environment, a mapping from variables to values. We will write environments as { x |-> e1, y |-> e2 } meaning variable x maps to value e1 and variable y maps to e2. For the above example, to compute

(Function x -> x x x)(whopping expr)
(x x x) in environment {x |-> whopping expr} is thus computed.
There is a possibility for some anomalies with the above scheme. They arise when a function is returned as the result of another function, and that function has local variables in it. Consider the example
F = Function x -> if x = 0 then Function y -> y else Function y -> x * y
when F(3) is computed. the environment binds x to 3 while the body is computed, so the result returned is
Function y -> x * y
BUT it would be a mistake just to return this as the value because the fact that x is in fact 3 would have been lost.

Solution: when a function is returned as a value, the closure of the function is in fact returned.

A closure consists of a function and an environment. The idea is all free variables in the function are bound to values in the environment. For the above case, return the closure

(Function y -> x * y, "x |-> 3")
So, a environment-based evaluator along these lines can be defined (we won't consider the details here).

Theorem. A substitution-based evaluator and an explicit environment evaluator for D are equivalent: all D programs either terminate on both evaluators or compute forever on both evaluators.

The closure view of function values is critical to be able to write a compiler: compilers can't be doing substitutions of code on the fly!

The DSR Language

We can now define the language DSR. It is a call-by-value language that includes the basic features of D, with in addition the extensions added for records (DR), and state (DS).

We will study translations for DSR. Missing language features that we will study later (and not consider when studying translations) include objects and classes, exceptions, and types.

Here are the official DSR expr types.

type label = Lab of string

type ident = Ident of string

type expr = 
 Var of ident | Function of ident * expr | Appl of expr * expr |
 Letrec of ident * ident * expr |
 Plus of expr * expr | Minus of expr * expr | Equal of expr * expr | 
 And of expr * expr| Or of expr * expr | Not of expr |  
 If of expr * expr * expr | Int of int | Bool of bool 
 Ref of expr | Set of expr * expr | Get of expr | Cell of int |
 Record of (label * expr) list | Select of  label * expr |
 Let of ident * expr * expr

Last modified: Thu Apr 18 12:28:10 EDT 2002