# Operational Semantics

Goals:
• Understand in precise detail the meaning of programs
• Gives a formal spec. of language for compiler writers, etc
• We can rigorously verify things are working how we expect
Means:
• make a model (mathematical representation) of programming language execution.
• Our two models
• operational semantics ("semantics = "meaning") for the language.
• Caml interpreters for a language
• Operational semantics = an interpreter defined mathematically.
• An operational semantics is more precise:it isn't based on Caml program meaning;
thus, you can more reliably prove things about it.
• We will mostly write interpreters (we can then actually run them to test) but will start with operational semantics.

Other forms of language semantics we will not cover
• axiomatic semantics -- a collection of axiomatic truths about programs in a language.
• denotational semantics -- modelling programs as static mathematical objects (programs as set-theoretic functions with certain properties)
These are also interesting viewpoints but we don't have the time to cover them.

## Operational Semantics

Goal: define how programs compute/evaluate/execute.
• A particular computation is defined by constructing a tree
• Parent is the computation, children are sub-computations needed to compute parent.
• example: parent (2+3)*(3-4) has children 2+3 and 3-4 because of how * is computed.
• Mathematical relation e ==> v means a program e computed to a final result v (a value).
• In the initial development, we are going to ignore side effects such as state and input/output, so all programs can do is return a value: pure functional programming only.

Here is an example of a computation (imagine the tree):
(node A): (Function x -> x + 2) (3 + 2 + 5) ==> 12  because
(node B, child of A):   3 + 2 + 5 ==> 10, because
(node C, child of B):      3 + 2 ==> 5, and
(node B, again):        5 + 5 ==> 10; and then,
(node E, child of A):   10 + 2 ==> 12.
... In general, to compute a function application,
1. Compute the argument to a value
2. Compute the body of the function with the argument textually substituted, to a value.
So, to compute this application there are two sub-computations performed, which are subtrees.

Definition. An operational semantics for a programming langauge is a mathematical definition of its computation relation, e ==> v, where e is a program in the language.

• e ==> v is mathematically a 2-place relation between expressions of the language, e, and values of the language, v.
• Example values above are numbers such as 5, and functions Function x -> .... are also values since they don't compute to anything.
• e and v are metavariables, meaning they denote an arbitrary expression or value, and should not be confused with the (regular) variables that are part of programs. In the beginning I will underline metavariables to underscore the difference, but eventually we will drop this convention for brevity.

Behind every language you have ever programmed is an operational semantics, but it is usually described informally, in English.
Operational semantics may be given to just about any kind of language behavior, but the rules do get more complicated.
• mutable state
• input/output
• exceptions
• objects and classes
• distributed programming

### Operational Semantics for Logic Expressions

Lets warm up with something very simple: boolean logic without variables.

Definition.The boolean logic expressions e consist of values True and False, and expressions e And e, e Or e, Not e, and e Implies e.

type boolexp = True | False | Not of boolexp | And of boolexp *
boolexp | Or of boolexp * boolexp | Implies of boolexp * boolexp
(Note: We are going to use Capitalized keywords in all of our little language syntax to avoid potential conflicts with e.g. Caml code.)

Definition The operational semantics for boolean logic is defined as the least relation ==> satisfying the following rules.
True rule
----------------------------
True ==> True

False rule
----------------------------
False ==> False

Not rule
e ==> v
-----------------------------
Not e ==> the negation of v

And rule
e1 ==> v1, e2 ==> v2
--------------------------------------
e1 And e2 ==> the logical and of v1 and v2

Or, Implies rules: should be clear from above.

• These rules form a proof system in analogy to logical rules
• If you are unfamiliar with proof systems:
• The horizontal bar reads "implies"; rules represent logical truths.
• Rules with nothing above the line are axioms, they always hold.
• A proof of e ==> v amounts to constucting a sequence of rule applications for which each the final rule application logically concludes with e ==> v.

An example.

Not(Not(False)) And True ==> False, because by the And rule,

True ==> True, and Not(Not(False)) ==> False, the latter because

Not(False) ==> True, because

False ==> False.

This computation is a tree because there are two subcomputations necessary for each binary operator.

Question: Why in the above definition does it state that ==> is the "least" relation satisfying the rules?
Answer: "least" here means fewest pairs related. If we did not state this requirement, then a relation which related anything to anything else would also be a relation satisfying all the rules (think about it).

#### Provable Properties of Operational Semantics

The great thing about operational semantics is we can actually prove some properties about execution.

Lemma. The boolean language is deterministic: if e ==> v and e ==> v', then v = v'.

Proof. By induction on the height of the proof tree.
QED.

Lemma. The boolean language is normalizing: For all boolean expressions e, there is some value v where e ==> v.

Proof. By induction on the size of e.
QED.

Question: Suppose we left off the True rule by mistake; what nice property would fail?

### Operational Semantics and Interpreters

There is a very close relationship between an operational semantics and an actual interpreter written in Caml.

Given an operational semantics defined via relation ==>, there is a corresponding (Caml) evaluator function eval.

Note, the Caml function eval takes the program e as argument in the form of its syntax tree, for instance Plus(Int(1),Times(Int(2),Int(3))).

Definition. A (Caml) interpreter function eval faithfully implements an operational semantics e ==> v if

e ==> v if and only if eval(e) returns result v.

### Implementing an operational semantics

The above rules induce a Caml interpreter function eval as follows.
let rec eval exp =
match exp with
True -> True
| False -> False
| Not(exp0) -> (match eval exp0 with
True -> False
| False -> True)
| And(exp0,exp1) -> (match (eval exp0, eval exp1) with
(True,True) -> True
| (_,False) -> False
| (False,_) -> False)

| Or(exp0,exp1) -> (match (eval exp0, eval exp1) with
(False,False) -> False
| (_,True) -> True
| (True,_) -> True)

| Implies(exp0,exp1) -> (match (eval exp0, eval exp1) with
(False,_) -> True
| (True,True) -> True
| (True,False) -> False)
The only difference between the operational semantics and evaluator is the evaluator is a function, so we start with the bottom-left expression in a rule, use the evaluator to recursively produce the value(s) above the line in the rule, and finally compute and return the value below the line in the rule.
Fact. The boolean language interpreter above faithfully implements its operational semantics: e ==> v if and only if eval(e) returns v as result.

We will go back and forth between these two forms during the course. The operational semantics form is used because it is independent of any particular programming language. The evaluator form is good because you can test your evaluator on real code.

Question: Why not just use interpreters and forget about the operational semantics approach?
Answer: Then the whole exercise is circular, since we don't really know what the Caml compiler is doing. Operational semantics provides a foundation free of any particular language.

Definition. A metacircular interpreter is an interpreter for (possibly a subset of) language X that is written in language X. Metacircular interpreters give you some idea of how a language works, but suffer from the above non-foundational problems. A metacircular interpreter for Lisp is a classic programming language theory exercise.

## The D Language

We now study our first programming language, D.

D is a "Diminutive" pure functional programming language.

• All it has is numbers, booleans, and higher-order anonymous functions.
• It is untyped for now; we will type it later.
• In most ways it is much weaker than Caml: no reals/lists/types/modules/state/exceptions
• In one way it is actually more powerful than Caml: there are no types so some programs that are not typable in Caml but which have no run-time errors can be written.
• Recursive function definition is not built into the language but it is definable.
• Run-time type errors can result in D: e.g. application (5 3).

#### D and the lambda calculus

The lambda calculus is an even simpler language with only functions, which are written lambda x.e instead of Function x -> e, where lambda is the Greek lowercase character. It is the original higher-order functional language, and dates from the 40's (!). More later on this.

#### Turing Completeness of D

D is still Turing complete: every partial recursive function on numbers can be written in D. In fact, its even Turing-complete without the numbers or booleans (the pure lambda-calculus). No (deterministic) programming language can compute more than the partial recursive functions.

Definition The expressions, e, of the D language are inductively defined as the least set including

1. variables x,
2. (anonymous) functions Function x -> e and function application e e,
3. Recursive functions Let Rec f x = e
4. numbers 0, 1, -1, 2, -2, ... and numerical operations + - = ,
5. booleans True, False and boolean operations And, Or, Not,
6. and conditional If e Then e Else e.
The value expressions of D are
• the numbers 0, 1, -1, ...,
• booleans True and False,
• functions Function x -> e
• and recursive functions Let Rec x f = e.

Note, the metavariables we are using include e meaning an arbitrary D expression, v meaning an arbitrary expression that is a value, and x meaning an expression which is a variable.

The Caml variant type for D syntax is as follows.
type ident = Ident of string

type expr =
Var of ident | Function of ident * expr | Appl of expr * expr |
Letrec of ident * ident * expr |
Plus of expr * expr | Minus of expr * expr | Equal of expr * expr |
And of expr * expr| Or of expr * expr | Not of expr |
If of expr * expr * expr | Int of int | Bool of bool
• The type expr is the type of D expressions within Caml
• The type ident is needed because function parameters must be variables, they can't be any other expression. This is one way to force this.

Abstract and concrete syntax
• The concrete syntax of a programming language is the (text/string) representation of a program, usually defined via a grammar.
• The abstract syntax is the explicit tree representation which is used internally in its implementation.
• The concrete D syntax is how we are writing it on the blackboard: "(Function x -> x + 2) (3 + 2 + 5)"
• The abstract syntax is how it is a member of the above Caml expr type: Appl(Function(Ident"x",Plus(Var(Ident"x"),2)),Plus(3,Plus(2,5))) is the abstract syntax corresponding to the above concrete syntax.
We will glibly switch back and forth between the concrete and abstract syntax: if we are talking relative to Caml the abstract syntax will be used, and outside of Caml we will use the concrete syntax.

Higher-order functions in D The main feature of D is higher-order functions, which also introduces variables. Recall that programs are computed by rewriting them.
(Function x -> x + 2) (3 + 2 + 5) ==> 12  because
3 + 2 + 5 ==> 10, because
3 + 2 ==> 5, and
5 + 5 ==> 10; and then,
10 + 2 ==> 12.
Note how in this example, the argument is substituted for the variable in the body -- this gives us a rewriting interpreter.

### Variable Substitution

• Functions compute by substituting the actual argument for the for parameter;
for example, (Function x -> x + 1) 2 will compute by substituting 2 for x in the function's body x + 1, i.e. by computing 2 + 1.
• We need to be careful on how variable substitution is defined, (Function x -> Function x -> x)(3) should not evaluate to (Function x -> 3) since the inner x is bound by the inner paramater.

To do this right, we need to define the concepts of an occurrence of a variable, a bound occurrence, a free occurrence, a binding, a closed expression, and substitution.

Definition

1. A variable use x occurs in e if x appears somewhere in e. Note we refer to variable uses only, not definitions.
2. Any occurrences of variable x in Function x -> e are bound; any free occurrences of x in e here are bound occurrences in Function x -> e. Similarly, occurrences of f and x are bound in Let Rec f x = e.
3. A variable x occurs free in e if it has an occurrence in e which is not a bound occurrence.
4. An expression e is closed if it contains no free variable occurrences. All programs we execute are closed (no link-time errors).
5. e[e'/x] is notation for the expression resulting from the operation of replacing all free occurrences of x in e with e'. For now, we assume that e' is a closed expression.
The notions of bound and free should be familiar to you from block-structured languages.

Examples.

• x occurs free in 3 + x, and occurs both bound and free in expression x (Function x -> x).
• Neither of the above expressions is closed, but (Function x -> x) (Function x -> x) is closed.
• Substitution x (Function x -> x)[(Function x -> x+1)/x] is (Function x -> x+1) (Function x -> x).
• (y + y)[3/y] is 3 + 3
• (Function y -> y + y)[3/y] is (Function y -> y + y) since there are no free y to substitute for.

### Operational Semantics for D

We are now ready to define the operational semantics for D.
Value rule
--------------------------------------
v ==> v

Boolean rules: see above boolean language

+ rule
e1 ==> v1, e2 ==> v2
--------------------------------------
e1 + e2 ==> the integer sum of v1 and v2, provided v1 and v2 are integer constants

- rule
Similar to +.

= rule
e1 ==> v1, e2 ==> v2
--------------------------------------
e1 = e2 ==> True if v1 v2 are identical numbers, and ==> False if they are different numbers or not numbers.

If True rule
e1 ==> True, e2 ==> v2
------------------------------------------------------
If e1 Then e2 Else e3==> v2

If False rule
e1 ==> False, e3 ==> v3
------------------------------------------------------
If e1 Then e2 Else e3==> v3

(Anonymous) Function application rule
e1 ==> Function x -> e, e2 ==> v2, e[v2/x] ==> v
------------------------------------------------------
e1 e2 ==> v

Let Rec Function application rule
(exercise)

• Function application as defined above is call-by-value, the argument is evaluated before the function is applied.
• Later we discuss another possibility, call-by-name.
• Call-by-reference or value-result parameter passing styles are irrelevant for languages, such as D, that contain no mutable store operations.

#### Example Executions

If in doubt, draw out the derivation trees that show the execution precisely.
If 3 = 4 Then 5 Else 4 + 2 ==> 6 because
3 = 4 ==> False and
4 + 2 ==> 6, because
4 ==> 4 and
2 ==> 2 and 4 plus 2 is 6.

(Function x -> If 3 = x Then 5 Else x + 2) 4 ==> 6, because of
above derivation

(Function x -> x x)(Function y -> y) ==> Function y -> y, because
(Function y -> y)(Function y -> y) ==> Function y -> y

(Function f -> Function x -> f(f(x)))(Function x -> x - 1)(4) ==> 2 because
letting F abbreviate (Function x -> x - 1),
(Function x -> F(F(x)))))(4) ==> 2, because
F(F(4)) ==> 2, because
F(4) ==> 3, because
4 - 1 ==> 3.  And then,
F(3) ==> 2, because
3 - 1 ==> 2.

(Function x -> Function y -> x+y)
((Function x -> If 3 = x Then 5 Else x + 2) 4)
(Function f -> Function x -> f(f(x)) (Function x -> x - 1)(4) ==> 8 by the above two executions

(Let Rec f x = If x = 0 then 1 else x + f (x - 1))(1) ==> 1 because
letting F abbreviate (Let Rec f x = If x = 0 then 1 else x + f (x - 1)),
If 1 = 0 then 1 else 1 + F (1 - 1) ==> 1, because
1 = 0 ==> False, and
1 + F (1 - 1) ==> 1, because
F (1 - 1) ==> 0, because
1 - 1 ==> 0, and
If 0 = 0 then 1 else 0 + F (0 - 1)) ==> 0, because
0 = 0 ==> True, and
0 ==> 0

#### Mathematical Properties of D programs

Lemma. D is deterministic.
Proof. By inspection of the rules, at most one rule can apply at any time. (Need the Let Rec rule to prove this precisely)

Lemma. D is not normalizing: there is some e such that there is no v with e ==> v.
Proof. (Function x -> x x)(Function x -> x x) is not normalizing. Neither is 4 3.

### A D Interpreter

As part of homework 3, you are to write a Caml eval function which takes D programs and produces D values as result, following the above operational semantics. File D-examples.ml contains the Caml type for D syntax as well as some sample executions. That file will be reviewed in class, and contains concrete D code for most of the examples in the remainder of these notes.

## Pure functional programming in D

D doesn't have many features, but it is possible to do much more than you may at first think.

• First, there are the classic logical combinators, simple functions for recombining data.
• To get a feel for the abstract syntax of D that you will use in your evaluators, we will use it here, and program within Caml.
• We will informally define entities as macros by let with Caml.
Here are the combinators as given in the D-examples.ml file.
(* First some abbreviations to save finger wear *)

let i s = Ident s            (* abbreviation for identifiers *)
let v s = Var(Ident s)       (* abbreviation for variables *)

(* super shorthand for common identnt/var names *)

let ix = i"x" (* ident x *)
let vx = v"x" (* variable x *)
let iy = i"y" (* ident y *)
let vy = v"y" (* variable y *)
let iz = i"z" (* ident z *)
let vz = v"z" (* variable z *)
let il = i"l" (* ident l *)
let vl = v"l" (* variable l *)
let ir = i"r" (* ident r *)
let vr = v"r" (* variable r *)

(* The classic pure functional combinators *)

let id = Function(ix,vx)                        (* I x = x *)
let k = Function(ix,Function(iy,vx))            (* K x y = x *)
let s = Function(ix,Function(iy,Function(iz,    (* S x y z = (x z) (y z) *)
Appl(Appl(vx,vz),(Appl(vy,vz))))))
let d = Function(ix, Appl(vx,vx))               (* D x = x x *)
Macros:
• Everything let-defined in the above Caml code is a macro, an abbreviation for some D expr.
• We will also use informal D macro abbreviations on the blackboard: "let F be Function x -> ... " kind of thing.
• Just make sure not to confuse macro names (such as id above) with D variables: id above is a macro name, not a D variable.

It turns out that D can express a great deal more than it first appears. In particular, the expressive power of Caml's let, recursive definitions via let rec, and stuctures such as lists or other datatypes are all encodable in D, by applying a trick or two.

### Encoding n-tuples and lists

• Tuples and lists are encodable from just functions, and so are not needed as primitives.
• Of course for an efficient implementation you would want them as primitives;
• This encoding is thus an exercise to better understand the nature of functions and tuples.

We will define a 2-tuple (pairing) constructor;
From a pair you can get a 3-tuple by building it from pairs as (1, (2,3)), ... etc for n-tuples.

(* Pairs may be encoded as functions (not entirely adequate however) *)

let pr (l,r) =  (* make a pair with left element l and right element r *)
Appl(Appl(Function(il,Function(ir,Function(ix,Appl(Appl(vx,vl),vr)))),l),r)

let prexample = pr(Int 4,pr(Int 5,Bool true))

(* projections left and right *)

let left e =  Appl(e,Function(ix,Function(iy,vx)))
let right e = Appl(e,Function(ix,Function(iy,vy)))
Test: try pr(4,5). That is
(Function l -> Function r -> Function x -> x l r) 4 5
which computes by computing
(Function x -> d 4 5)
which is a value and we are done. Now lets try left(pr(4,5)). We have pr(4,5)'s value from above; continuing,
(Function p -> p (Function x -> Function y -> y))(Function d -> d 4 5)
computes by computing
(Function d -> d 4 5) (Function x -> Function y -> y)
which computes by computing
(Function x -> Function y -> y) 4 5
which computes by computing
5
Voila!

Problems with this encoding of pairs:
• left (Function x -> 0) ==> 0
but a function shouldn't be a pair! There should have been a run-time error here.
• Suppose we wrote the program right(pr(3,pr(4,5))); it will evaluate to the value pr(4,5) one might at first think, but it will really return (Function x -> x 4 5). We can only guess that this is intended to be a pair.
• This encoding is thus flawed, and we will in the future want to introduce "real" n-tuples into an extension of D.

Lists can also be implemented via pairs
• list [1;2;3] is represented by pr(1, pr(2, pr(3, emptylist)))
• emptylist is some agreed-on empty list, 0 for us.
• (This encoding of lists in terms of pairs won't work for Caml because it won't typecheck)
Here are the implementations.
(* Pairs may be encoded as functions (not entirely adequate however) *)

let pr (l,r) =  (* make a pair with left element l and right element r *)
Appl(Appl(Function(il,Function(ir,Function(ix,Appl(Appl(vx,vl),vr)))),l),r)

let prexample = pr(Int 4,pr(Int 5,Bool true))

(* projections left and right *)

let left e =  Appl(e,Function(ix,Function(iy,vx)))
let right e = Appl(e,Function(ix,Function(iy,vy)))

(* Lists may be encoded as a pair consisting of the head and tail *)

let tail = right
let emptylist = (Int 0)  (* something for empty list *)
let cons = pr
let length = Letrec(i"Length",ix,
If(Equal(vx,emptylist),(Int 0),Plus(Appl(v"Length",tl(vx)),
(Int 1))))));

let aList = cons(Plus(Int 1,Int 1),cons(Plus(Int 1,Int 1),cons(Int 3,emptylist)))

• This encoding approach is very powerful, and also gives us a way to understand complex languages from more simple ones.
• Even numbers, booleans, and if-then-else are encodable, it turns out (we will skip this topic).
• Thus, all that is needed is Function and application to make a Turing-complete programming language. This language is known as the pure lambda calculus, and functions are usually written as lambda x.e intead of Function x -> e.

#### Other Examples of Expressiveness Within D

Functions of multiple arguments: use currying, just as is common in Caml.

Let is definable:

Let x = e in e' is defined as (Function x -> e') e
An example: Let x = 3 + 2 in x + x End is (Function x -> x + x)(3 + 2), which evaluates to 10.

Sequencing. Notice there is no sequencing (;) operation. Why not? Answer: if e;e' is what you want to sequence, you might as well just write e', as e will never get used. This changes if Print or mutable state is added (operators with side effects). Sequencing is definable, nonetheless:

e ; e' is defined as (Function newvar -> e') e, where newvar is chosen so as not to be
free in  e'
This will first execute e, throw away the value, and then execute e', returning its result as the final result of e;e'.

Freezing and thawing We can stop and re-start computation at will by freezing and thawing.

• Freeze e is defined as Function newvar -> e
Thaw e is defined as e(0)
(newvar should be a fresh variable, so its not free in e; the 0 above could be any value)
• Freeze(e) freezes e, keeping it from being computed. Thaw(e) starts up a frozen computation.
• Example.
Let x = Freeze(2+3) in Thaw(x) + Thaw(x)
--this has same value as without Freeze/Thaw, but 2+3 evaluated twice.

### Encoding Recursion in D

D has a built-in Letrec to write recursive functions, but its actually not needed to write recursive functions! Some special trickery is needed. Wax those surfboards, a wave is coming.

Q: How can programs compute forever in D without Let Rec?

A: Easy: (Function x -> x x)(Function x -> x x). Corollary: D is not normalizing.

• This computation is in some sense odd: x x is a function being applied to itself!
• There is a logical paradox at the heart of this non-normalizing computation, Russell's paradox.

In Frege's set theory (circa 1900), sets were written as predicates P(x), which we can view as functions.
• Function x -> x < 2 is the set of all numbers less than 2.
• Set membership is via application: e member-of S iff S(e) is true.
• Since (Function x -> x < 2)(1) is true, 1 is in this "set".

Now consider P = "the set of all sets that do not contain themselves as members"!:

P = Function x -> Not(x x)
(Note, it may make sense to have a set with itself as member: the set {{{{...}}}}, infinitely receding, has itself as a member; this only happens in so-called non-well-founded set theory).

Now, is P P? Namely is P a member of itself? This is written:

(Function x -> Not(x x)) (Function x -> Not(x x))
--if this were viewed as a D program, it would loop forever: it suffices to compute
Not((Function x -> Not(x x))(Function x -> Not(x x))))
Now, notice we have P is a member of itself if and only if it isn't, a contradiction!

• This is Russell's Paradox in computational form (this isn't how Russell viewed it, but it has the same structure at the core).
• The computational realization of the paradox is that the predicate doesn't compute to true or false, so its not a sensible logical statement.
• Russell's discovery of this paradox in Frege's set theory shook the foundations of mathematics.
• To solve this problem, Russell developed his ramified theory of types, which is the ancestor of types in programming languages.
• The program (function x -> not(x x)) (function x -> not(x x)) is not typeable in Caml for the same reason the predicate is not typeable in Russell's ramified theory of types.
# function x -> not (x x);;
^
This expression has type 'a -> 'b but is here used with type 'a

#### Encoding recursion by passing self

• In the logical view, passing a function to itself as argument is a bad thing;
• From a programming view, it can be a good, powerful thing: it allows recursive functions to be defined, without Let Rec.
• --Surf the paradoxical wave, but be careful, its easy to get crunched!
Here is the idea:
• Invariant: for every recursive function, always keep around two identical copies of the function: one to use, and one to copy again.
• When you do a recursive call, pass along a copy of the function
• In the recursive call, make two more copies. Use one of the copies to compute with, and save the other to pass on for a future recursive call.
• etc: each recursive call uses one copy but duplicates another so there is always a spare.

Here is how a summation function can be defined around these ideas which summates the numbers 0..n for agrument n. First define
summate0 = Function this -> Function arg ->
If arg = 0 Then 0 Else arg + this(this)(arg-1) + 1
Then we can write a function call as
summate0(summate0)(7) (* summates numbers 0 .. 7 *)
• summate0 always expects its first argument this to be itself
• it can then use one copy fir the recursive call (the first this) and pass another copy for future duplication.
• summate0(summate0) primes the pump by giving it an initial extra copy of itself.

In general, we can write the whole thing in D as
let summate =
Let summ = (Function this -> Function arg ->
If arg = 0 Then 0 Else arg + this(this)(arg-1) + 1)
In
Function arg -> summ(summ)(arg)
and invoke as
summate 7 (* summates numbers 0..7 *)
so we don't have to let the world see the self-passing business.

The Y-Combinator. The Y-combinator is a further abstraction on this: summ can be abstracted to be some abstract body passed in itself as a higher-order function.
almosty = Function body ->
Let fun = (Function this -> Function arg ->
body(this)(arg))
In
Function arg -> (fun fun)(arg)
-- the body of summ above contains arg and this, so the abstract body body gets those things passed to it. almosty can be used by defining summate as
summate = almosty (Function thisthis -> Function arg ->
If arg = 0 Then 0 Else arg + this(this)(arg-1) + 1)
The Y-combinator actually goes one more step and passes this(this) as argument, not just this, simplifying what we pass to Y:
y = Function body ->
Let fun = (Function this -> Function arg ->
body(this this)(arg))
In
Function arg -> (fun fun)(arg)
This combinator can then be used to define summate as
summate = y (Function thisthis -> Function arg ->
If arg = 0 Then 0 Else arg + thisthis(arg-1) + 1)
-- the parameter thisthis is exactly used for a recursive call.

The above is almost the Y combinator given in the D-examples.ml file; the major difference is that version has fun inlined (repeated twice) instead of being defined via Let.

### Call-by-name Parameter Passing

• In call by name parameter passing, the argument to the function is not evaluated at function call time; it is only evaluated if it is used.
• This style of parameter passing is largely of historical interest now, Algol uses it but no modern languages do.
• It is much harder to write efficient compilers if call-by-name parameter passing is used.
Definition Define a call-by-name evaluation relation ==> for D by replacing the Function application rule with the following rule.
call-by-name Function application rule
e1 ==> Function x -> e, e[e2/x] ==> v
------------------------------------------------------
e1 e2 ==> v

And, similarly a new rule for Let Rec is needed.

Freezing and Thawing, defined above, is a way to get call-by-name behavior in a call-by-value language.
Consider then the computation of
(Function x -> Thaw(x) + Thaw(x))Freeze(3-2)
-- 3-2 is not evaluated until we are inside the body of the function where it is thawed, and it is then evaluated two separate times. This is precisely the behavior of call-by-name parameter passing, so Freeze and Thaw can encode it by this means. The fact that 3-2 is executed twice shows the main weakness of call by name: repeated evaluation of the function argument.

Lazy or call-by-need evaluation is a version of call-by-name that caches evaluated function arguments the first time they are evaluated so it doesn't have to re-evaluate them in subsequent uses. Haskell is a pure functional language with lazy evaluation.

### The (pure) lambda-calculus

A classic simple language with only functions: take D and remove the numbers, booleans, and conditional.

It is called the lambda-calculus because functions are written lambda x.e (using the Greek lambda character) instead of Function x -> e.

Fact: Numbers, booleans, and conditional can be encoded in the pure lambda-calculus.

Execution in the pure lambda calculus

• Even programs with free variables can execute (reduce in lambda-calculus terminology)
• Execution can happen anywhere, e.g. inside a function body that hasn't been called yet
• (Function x -> e) e' ==> e[e'/x] is the (only) execution rule, called beta reduction
This form of computation is interesting conceptually but is more distant from how actual computer languages execute.

## Operational Equivalence

In this course we are taking a mathematical view of programs. What is a primary relation defined over a space of mathematical objects? Equivalence!

• We can imagine an equivalence =~ (written on the board as = with a ~ above it) defined for all D programs.
• The initial idea is that two programs are equivalent if they always lead to the same results when used.
Examples.
• Eta conversion is one example of an interesting equivalence:
(eta-conversion) (Function x -> e)  =~  (Function z -> (Function x -> e) z) for z not free in e.

This equivalence is similar to the proxy pattern in object-oriented programming.
• A closely related law for our Freeze/Thaw syntax is
Thaw(Freeze(e)) =~ e

One of these programs may be replaced by the other without ill effects (besides perhaps changing execution time), so they are equivalent.

Equivalence is important!
• Equivalence is an important concept because it allows programs to be transformed by replacing bits with equal bits and the programmer need not even be told since the observed behavior will be the same.
• Thus, they are transformations that can be performed by a compiler.
• Operational equivalence provides a rigorous foundation for compiler optimization.

### Defining Operational Equivalence

We define equivalence in a manner dating all the way back to Leibniz:
Two programs are equivalent if and only if one can be replaced with the other at any place, and no external change in behavior will be noticed.
• We wish to study equivalence for possibly open programs, because there are good equivalences such as x + 1 - 1 =~ x.
• We define "at any place" by the notion of a program context, a D program with some "holes" * punched in it.
• Then, to test if e1 =~ e2,
1. First place e1 in the * position and run the program;
2. Then do the same thing for e2.
3. If any observable result is the same, they are equal, otherwise not.

#### A more precise definition of equivalence

We define the notion of contexts C as follows.
• Take a D program with some "holes" * punched in it: replace some subterm(s) of any expression with *.
• Then, hole filling, C[e], means mean place e in the holes * in C
• Hole filling is like substitution, BUT there is no bound/free variable issue: direct replacement, no conditions
Examples of contexts and hole filling

Contexts:

(Function z -> (Function x -> *) z)
(Function q -> e)(*)
Hole filling:
(Function z -> (Function x -> *) z)[x+2]
Means "put x+2 in the hole(s) in the (Function z .. )term"; the result is
(Function z -> (Function x -> x+2) z)
• Note e may have free variables in it which become bound under substitution ; this is known as capture.
• Variable x in x+2 is captured in the above example.

Operational equivalence is defined simply as follows:

Definition e =~ e' if and only if for all contexts C, C[e] ==> v for some v if and only if C[e'] ==> v' for some v'.

• So, two expressions are equivalent if in any possible context C, the one terminates if the other does.
• This equivalence is known as operational equivalence because its definition is based on the interpreter for the language.
• Note that nothing is said about the values v and v', they could in theory be different.
v and v' can be anything because a bigger context could always test them some more: the context
Function x ->C'[x](C[e])
would first compute to C'[v], and then v is tested by context C'. So, v and v' above are going to have to be quite similar, and in fact it is easy to show that they must be identical if they are not functions.

• The only problem with this definition of equivalence is its "incestuous" nature--there is no absolute standard of equivalence removed from the language.
• Domain Theory is a mathematical discipline which defines an algebra of programs in terms of existing mathematical objects (complete and continuous partial orders).
• We are not going to study domain theory here: it doesn't generalize well to proramming languages with side effects (exceptions, state, input/output), and we don't have the time to cover it.

### Example Equivalences

Some general equivalence principles for D programs are defined.

Here are some laws.

• reflexivity e =~ e, symmetry e =~ e' if e' =~ e, transitivity e =~ e'' if if e =~ e' and e' =~ e''
• C[e] =~ C[e'] if e =~ e' (congruence)
• (Function x -> e)(v) =~ e{v/x} (this is beta-equivalence; e{v/x} is capture-avoiding substitution, defined below)
• (Function x -> e) =~ (Function z -> (Function x -> e) z) (eta)
• (Function x -> e) =~ (Function y -> e{y/x}) (alpha)
• n + n' =~ the sum of numbers n and n', and similar laws for -, And, Or, Not, =;
• If True Then e else e' =~ e, and similar for If False...
• If e ==> v then e =~ v (evaluation)
Equivalence transformations on programs can be used to justify results of computations instead of directly computing with the evaluator; it is often easier.

An important equation relating Y:

Y f x =~ f (Freeze(Y f)) x
An important component of compiler optimization is applying transformations such as the above that preserve equivalence.

Technical Issue: capture-avoiding substitution
• The annoying variable capture problem has raised its ugly head in the beta rule above.
• We use renaming substitution e{e'/x} to deal with capture.
• Renaming substitution e{e'/x} is a generalized form of substitution that differs from our previously defined substitution operation e[e'/x] in that e' does not have to be closed.
• In such a case, we want to replace x with e', but avoid capture from occurring. This is implemented by renaming any capturing variable bindings in e.
For example,
(Function z -> (Function x -> y + x) z){x + 2/y} = (Function z -> (Function x1 -> x + 2 + x1) z)
• x + 2 would be captured if we just stuffed x + 2 in for y, a bad thing.
• Its bad because be congruence we should be able to replace one =~ thing with the other anywhere; but in
Function x ->(Function z -> (Function x -> y + x) z)(x + 2)
if we ignored capture the beta rule would give us
Function x ->(Function z -> (Function x -> (x + 2) + x) z)
which is clearly not equivalent to the above program.
• To avoid this problem, the capture-avoiding substitution operation renames x to a fresh variable not occurring in e or e', x1 in this case.

#### Proving Equivalences Hold

• It is surprisingly difficult to actually prove any of these equivalences hold!
• Even 1 + 1 =~ 2 is hard to prove.
• If we had more time we would take a closer look at this topic.