Types

We now study types.
Advantages of typed languages: The main advantage of an untyped language is greater expressiveness:
Different dimensions of types you know:
New dimensions of program invariants (types) that are currently active research topics:
Statically typed vs Dynamically typed vs untyped

We will use the "T" prefix to indicate a typed version of an untyped language previously studied. Thus, we have TD, TDS, TDR, TDSR, TDOB, TDX, TDSRX, ...
There are too many languages to look at, so we consider only TD for a warm-up, and then do full-blown TDSRX.


Design Issues

Before we begin the investigation of typing issues for particular languages, there are some general design issues to address.

How much explicit type information? How much type information is the program text to be decorated with, and how much is inferred by the compiler? A spectrum of possibilities exists.

The Caml type system is quite flexible in that the whole spectrum is possible:
# function x -> x;;
- : 'a -> 'a = <fun>

# ((function (x : int) -> (x : int)) : int -> int);;
- : int -> int = <fun>
We will concentrate on the C/Pascal view.


Type checking and Type inference algorithms

For a typed language, the compiler should typecheck the program before generating code.


A Typed D Language: TD

In analogue with our development of operational semantics/interpreters, we will define

Type Systems

Type Assertions

The type assertion
Gamma |- e : tau
reads "in type environment Gamma, e is of type tau". A type environment gives the types of free variables in e, and is a list
Gamma = x_1 : tau_1, ..., x_n : tau_n
if x is listed twice in Gamma, the rightmost (innermost) binding is the proper type. We write Gamma(x) = tau to indicate that tau is the innermost type for x in Gamma.

The TD Types

The TD types tau in concrete syntax are
tau ::= Int | Bool | tau -> tau
In abstract syntax within Caml, they are
type dtype = Int | Bool | Arrow of dtype * dtype
The expressions of TD are almost identical D, but functions differ slightly in that they come with type decoration:
Function x : tau -> e : tau'
which in the abstract syntax is
Function of ide * dtype * expr * dtype

The TD Type Rules

Here are the rules for generating valid typing assertions.

------------------------ (Hyp)
Gamma  |- x : tau                 for Gamma(x) = tau 


----------------------------- (Int)
Gamma  |- n : Int                 for n an integer


----------------------------- (Bool)
Gamma  |- b : Bool                for b either True or False


Gamma  |- e : Int , Gamma  |- e' : Int                 
---------------------------------------- (+)
Gamma  |- e + e' : Int                


Gamma  |- e : Int , Gamma  |- e' : Int                 
---------------------------------------- (-)
Gamma  |- e - e' : Int                


Gamma  |- e : Int,     Gamma  |- e' : Int            
-------------------------------------------- (=)
Gamma  |- e = e' : Bool                

(note, equality only typechecks for integers, not booleans.)

(And, Or, NOT rules should be obvious)

Gamma  |- e : Bool,  Gamma  |- e' : tau,  Gamma  |- e'' : tau
------------------------------------------------------------------ (If)
Gamma  |- If e Then e' Else e'' : tau


Gamma, x : tau |- e : tau'
------------------------------------------------    (Function)
Gamma  |- (Function x : tau  -> e : tau') : tau -> tau'


Gamma  |- e : tau -> tau', Gamma  |- e' : tau 
-------------------------------------------------- (Appl)
Gamma  |- e e' : tau'


Just as in operational semantics, a derivation of Gamma |- e : tau is a tree of rule applications where the leaves are axioms (Hyp, Int or Bool rules) and the root is Gamma |- e : tau.


Example Derivations

|- (Function x : Int -> (Function y : Bool -> (If y Then x Else x+1) :
Int) : Bool -> Int) : Int -> Bool -> Int
  Because by the function rule, it suffices to prove
  x:Int |- Function y: Bool -> (If y Then x Else x+1): Int) : Bool->Int
    Because by the function rule again, it suffices to prove
    x:Int, y: Bool |- If y Then x Else x+1 : Int  
      Because by the If rule, it suffices to prove
      x:Int, y: Bool |- y : Bool
      x:Int, y: Bool |- x : Int
      x:Int, y: Bool |- x+1 : Int
      all of which either follow by the hypothesis rule or + and hypothesis.
Given the above and letting
f = (Function x : Int -> (Function y: Bool -> (If y Then x Else x+1): Int) : Bool->Int)
we then have
|- f 5 True : Int
  Because by the application rule,
  |- f : Int -> Bool -> Int
    (which we derived above)
  |- 5 : Int by the Int rule
  And thus
  |- f 5 :  Bool -> Int by the application rule.
  Given this and
  |- True : Bool by the Bool rule
  we can get
  |- f 5 True :  Int by the application rule.
 

Recursion and TD

An interpreter for TD

Its easy to write an interpreter for TD: it is nearly a D interpreter, the type information is ignored at run-time.

Type Soundness

The following theorem asserts that our type system indeed prevents run-time errors from occurring.

Theorem (Type Soundness) If |- e : tau then in the process of evaluating e, a stuck state is never reached.


From Type Systems to Type Checkers

We can now play a similar game that we did when defining interpreters: given the language-independent type rules, define a type checking algorithm in a particular language, namely Caml.

Sketch of a TD Typechecker

Here is a first pass at a TD type checker, typecheck : envt * expr -> dtype.

gamma : envt can be implemented as a (ide * dtype) list, with the most recent item at the front of the list.

let typecheck gamma e = match e with
   Var x => lookup gamma x (* look up first mapping of x in list gamma *)
 | Function(Ide x,t,e,t') =>
                  if typecheck ((Ide x),t):: gamma) e  = t'
                  then Arrow(t,t') else raise TypeError |
 | Appl(e1,e2) => let  Arrow(t1,t2) = typecheck gamma e1 in
                                         if typecheck gamma e2  = t1
                                         then t2 else raise TypeError 
 | Plus(e1,e2) => if typecheck gamma e1 = Int
                                     and typecheck gamma e2  = Int
                                     then Int else raise TypeError 
    ...
The typechecker should faithfully implement the TD type system:

Lemma (faithfulness of typechecker):

This Lemma implies the typecheck function is a sound implementation of the type system for TD.

A Type System for TDSRX

There is not all that much deep going on with these type systems. So, let us consider next the TDSRX language.

We include just about every piece of syntax we have used up to now, except the DOB classes and objects. Here is the abstract syntax defined in terms of a Caml type.

type expr = 
 Var of ident | Function of ident * dtype * expr * dtype | Appl of expr * expr |
 Letrec of ident * ident * dtype * expr * dtype |
 Plus of expr * expr | Minus of expr * expr | Equal of expr * expr | 
 And of expr * expr| Or of expr * expr | Not of expr |  
 If of expr * expr * expr | Int of int | Bool of bool 
 Ref of expr | Set of expr * expr | Get of expr | Cell of int |
 Record of (label * expr) list | Select of  label * expr |
 LetExn of ident * dtype * expr |
 Raise of expr * expr | TryWith of expr * expr * ident * dtype * expr | Exn of int

and

dtype = Int | Bool | Arrow of dtype * dtype |
           Rec of label * dtype list | Rf of dtype | Ex of dtype
Now we will proceed to define the rules (whew!) for TDSRX.

The Type Rules for TDSRX

(( insert all of the TD rules here ))

Gamma, f : tau -> tau', x : tau |- e : tau' 
---------------------------------------------------------- (LetRec)
Gamma  |- (Let Rec f x : tau  = e : tau')  : tau -> tau'



Gamma  |- e1 : tau1, ..., Gamma  |- en : taun
--------------------------------------------------------------------- (Record)
Gamma  |- { l1 = e1, ..., ln = en  } : { l1 : tau1, ..., ln : taun}


Gamma  |- e : { l1 : tau1, ..., ln : taun}
-------------------------------------------------------- (Projection)
Gamma  |- e.li : taui,  for i a number between 1 and n


Gamma  |- e : tau
-------------------------------------------- (Ref)
Gamma  |- Ref e : tau Ref


Gamma  |- e : tau Ref,   Gamma  |- e' : tau
-------------------------------------------- (Set)
Gamma  |- e := e' : tau 


Gamma  |- e : tau Ref
-------------------------------------------- (Get)
Gamma  |- !e : tau


Gamma, xn : tau Exn |- e : tau'
-----------------------------------------------    (LetExn)
Gamma  |- (LetExn xn : tau Exn  In e )  : tau'


Gamma  |- xn : tau exn  Gamma  |- e' : tau
-------------------------------------------- (Raise)
Gamma  |- Raise (xn(e')) : tau' (any type OK)


Gamma  |- xn : tau' Exn,  Gamma  |- e : tau,    Gamma, x : tau' |- e' : tau
--------------------------------------------------------------------------  (Try)
Gamma  |- Try e with xn(x : tau') -> e' :  tau
(Question: why is there no typechecking rule for Cell's? Another question: why is there actually no need for Let Rec f x syntax in TDSRX to write recursive functions?)

Exercise: attempt to type some of the untyped programs we have studied up to now, e.g. the Y combinator, Let and sequencing abbreviations, the factorial example, and the encoding of lists.


Advanced Type Systems: Subtyping

Consider the function for example
Function x -> x.l + 1
This function takes as argument a record with field l of type Int. So, we could write it in a typed form as
Function x : {l : Int} -> (x.l + 1) : Int
In the untyped DR language the record passed into the function could also include other fields besides l, and the call
(Function x -> x.l + 1) {l = 4; l' = 6}
would generate no run-time errors. However, this would not type-check by our TDSRX rules: the function argument type is different from the type of the values passed.

Solution: Let us re-consider record types such as { m : Int; n : Int } to mean a record with at least m and n fields of type Int, but possibly other fields as well, of unknown type. Think about the previous record operations and their types: under this interpretation of record typing, the (Record) and (Projection) rules both still make sense.


Now, the old rules are sound but we need a new rule to reflect this new understanding:
Gamma  |- e : { l1 : tau1; ...; ln : taun}
-------------------------------------------------------- (Sub-Rec)
Gamma  |- e : { l1 : tau1; ...; ln : taum} for m less than n
This rule not as good as we could do. Consider the following example.
F = Function f -> f ({ x = 5; y = 6; z = 3}) +  f({x = 6; y = 4})
Here the function f informally should take a record with at least x and y fields, but also other fields could be present. Let us try to type the function F.
F : ({x : Int; y : Int} -> Int) -> Int
Consider the application F G for
G = Function r -> r.x + r.x
Now, however, consider the type-checking of this function G at
G : {x : Int} -> Int
This does not exactly match the type of F's argument, {x : Int; y : Int} -> Int, and so the type-check fails.

In fact we could have typed G a type {x : Int; y : Int} -> Int by the DSRX rules, but its too late to know that was the type we should have used.

The (Sub-Rec) rule is of no help here. What we need is a rule that says a function of record type argument may have fields added to its record argument type, as those fields will be ignored:

Gamma  |- e : { l1 : tau1; ...; ln : taun} -> tau
------------------------------------------------------------------ (Sub-Function)
Gamma  |- e : { l1 : tau1; ...; ln : taun; ...; lm : taum} -> tau 
Using this rule, F G will indeed type-check.

We need still other rules, though. Consider records inside of records:

{pt = {x = 4; y = 5}; clr = 0} : { pt : {x : Int}; clr : Int }
should be a valid typing since the y field will be ignored. However, there is no rule to allow this typing, either!


The STD type system: TD with Subtyping

The solution is to have a seperate set of subtyping rules just to determine when one type can be used in the place of another. tau <: tau' is read "tau is a subtype of tau'", and means that an object of type tau may also be considered an object of type tau'. The rule added to the TD type system is
Gamma  |- e : tau, |- tau <: tau'
------------------------------------------------------------------ (Sub)
Gamma  |- e : tau'
(in place of the above subsumption rules) The STD subtyping rules used to determine if tau <: tau' are
----------------------------------------(Sub Refl)
|- tau <: tau


|- tau <: tau',     |- tau' <: tau''
----------------------------------------(Sub Trans)
|- tau <: tau''


|- tau0' <: tau0,     |- tau1 <: tau1'
----------------------------------------(Sub Function)
|- tau0 -> tau1 <: tau0' -> tau1'

|- tau1 <: tau1'   ...  |- taun <: taun'
-------------------------------------------------------------------(Sub Record)
|- { l1 : tau1; ...; ln : taun; ... ; lm : taum} <: { l1 : tau1'; ...; ln : taun'}
For all of the examples discussed up to now, it should be clear that this set of more general rules will work.

Implementing an STD type-checker

(rest of this topic skipped in lecture)

We briefly sketch how the typecheck function for STD may be written. The TD type-checker requires certain types to be identical, e.g. the function domain type must be identical to the type of the function argument in an application e e'.

Gamma  |- e : tau -> tau', Gamma  |- e' : tau 
-------------------------------------------------- (Appl)
Gamma  |- e e' : tau'
In STD, at this point we will instead see if subtyping is possible: typecheck(e') returns tau'' for some type tau'', and then tau'' <: tau is checked via a function areSubtypes(tau'',tau). This produces a valid proof because
                           Gamma  |- e' : tau''
                           --------------------- (Sub)
Gamma  |- e : tau -> tau', Gamma  |- e' : tau 
-------------------------------------------------- (Appl)
Gamma  |- e e' : tau'
is a valid typing derivation. Other rules where the TD rules require a type match similarly are generalized to allow the (Sub) rule to be used.

Writing the areSubtypes function: exercise.

There may be a question as to whether typecheck(e) does not sometimes raise a typeError exception when the program is in fact typable. it is only using the subsumption rule in certain spots. However, this is not the case, it suffices to use the subsumption rule in these spots only.


Subtyping and Java/C++ types

Subtyping in other languages

Hindley-Milner Type Inference and Polymorphism

Lets look under the hood and see what Caml is doing to infer types.

Type inference and polymorphism

Type inference goes hand-in-hand with parametric polymorphism. Consider Function x -> x

Principal Types

Definition: A principal type tau for expression e (where |- e : tau) has the following property:
for any other type tau' such that |- e : tau', for any context C for which
 |- C[e : tau'] : tau''
(for any tau''), then
|- C[e : tau] : tau'''
as well, for some tau'''.


An Equational Type System: ED

We are going to present type inference in a nonstandard way. We will now define ED, a simple equationally typed version of the D language.
Equational types will be used to aid inference: the overall approach is to
  1. Infer equational types for the whole programs;
  2. If the equations are inconsistent, pronounce that there is a type error;
  3. If the equations are not inconsistent, simplify them to give an inferred type as given by Caml.
  4. (Implicit fact: if there are no inconsistencies in the equations, they can always be simplified to give an equation-free type)

Definition: An equational type is a type of the form
tau \ { tau1 = tau1' , ..., taun = taun' }
Each tau = tau' is an equation on types, meaning tau and tau' have the same meaning as types. We will let E mean some arbitrary set of type equations. For instance,
Int -> 'a \ { 'a = Int -> 'a1, 'a1 = Bool }
is an equational type. If you think about it, this is really the same as the type
Int -> Int -> Bool
when = is substituted for =. This is a step we are going to want to perform, so-called equational simplification. It is also possible to write "senseless" types like
Int -> 'a \ { 'a = Int -> 'a1, 'a = Bool }
which cannot be types since they imply functions and booleans are the same type! Such equation sets are deemed inconsistent, and will be equated with failure of type inference. There are also possibilities for circular (self-referential) types that don't quite look inconsistent:
Int -> 'a \ { 'a = Int -> 'a }

The ED Type System

The ED system is the following set of rules.
(Note that Gamma is as in the TD rules, asserting e.g. x : tau for variable x; the type tau is a simple (non-equational) type.)
Gamma(x) = tau 
------------------------------- (Hyp)
Gamma  |- x : tau \ E


----------------------------- (Int)
Gamma  |- n : Int \ emptyset            for n an integer


----------------------------- (Bool)
Gamma  |- b : Bool \ emptyset           for b either True or False


Gamma  |- e : tau \ E , Gamma  |- e' : tau' \ E'                
---------------------------------------------------------------------- (+)
Gamma  |- e + e' : Int \ E union E' union {tau = Int, tau' = Int }


Gamma  |- e : tau \ E  , Gamma  |- e' : tau'  \ E'                 
-------------------------------------------------------------------- (-)
Gamma  |- e - e' : Int  \ E union E' union {tau = Int, tau' = Int }               


Gamma  |- e : tau \ E ,     Gamma  |- e' : tau'  \ E'           
-------------------------------------------- (=)
Gamma  |- e = e' : Bool   \ E  union E' union {tau = Int, tau' = Int }             

(And, Or, NOT rules should be obvious)

Gamma  |- e : tau \ E ,  Gamma  |- e' : tau' \ E' ,  Gamma  |- e'' : tau'' \ E'' 
------------------------------------------------------------------------------------------------- (If)
Gamma  |- If e Then e' Else e'' : 'd \ E union E' union E'' union {tau = Bool, tau' = tau'' = 'd}


Gamma, x : 'a |- e : tau \ E 
---------------------------------------------------------    (Function)
Gamma  |- Function x  -> e : 'a -> tau \ E 


Gamma  |- e : tau \ E , Gamma  |- e' : tau'  \ E'
--------------------------------------------------------------- (Appl)
Gamma  |- e e' : 'a \ E union E' union { tau = tau' -> 'a }


Solving the equations

To solve the equations, we
  1. Compute the closure of the equations, producing new equations that hold by transitivity, etc;
  2. Check for any inconsistent equations like Int = Bool which denote type errors;
Computing Closure(E), the Equational closure of set E
Repeat the following:
Until no more equations can be added to E (note, we implicitly will use the symmetry property on these equations).

The closure serves to uncover inconsistencies. For instance

Closure({ 'a = Int -> 'b , 'a = Int -> Bool, 'b = Int}) =
        { 'a = Int -> 'b , 'a = Int -> Bool, 'b = Int,
          Int -> 'b = Int -> Bool, Int = Int, 'b = Bool, Int = Bool }
directly uncovering the inconsistency Int = Bool.

Fact: the closure of E can always be quickly computed.

After computing the closure, the constraints are consistent if

  1. No immediate inconsistency was uncovered (such as Int = Bool, Bool = tau -> tau', or Int = tau -> tau'),
  2. No self-referential equations exist. (We will deal with this issue below, ignore for now).

Solving the Equations

If inference produces consistent equations, we then solve the equational constraints by substituting type variables with types as follows.

Definition: Equation solution algorithm
Given tau \ E,

Repeat
replace some type variable 'a in tau with tau', provided 'a = tau' or tau' = 'a occurs in E and either
  1. tau' is not a type variable, or
  2. tau' is a type variable 'b which is lexicographically after 'a.
Until no more such replacements are possible.
This resulting type is the type inferred by Caml.

This algorithm is flawed, however: it may be that these replacements may continue forever. This is the case when there is a circular type in E. Recall the above example

Int -> 'a \ { 'a = Int -> 'a }
--this produces the nonterminating chain
Int -> Int -> 'a  \ { 'a = Int -> 'a }
Int -> Int -> Int -> 'a  \ { 'a = Int -> 'a }
Int -> Int -> Int -> Int -> 'a  \ { 'a = Int -> 'a }
...
Solution: Check for cycles in the equations before solving them as above.
Cycle Detection Algorithm:
  1. Define a directed graph G based on E.
  2. Nodes in G are type variables in E
  3. There is a edge from the 'a node to the 'b node if there is an equation 'a = tau' in E, and 'b occurs in tau'.
  4. Raise typeError if there is a cycle in G for which there is at least one edge representing a constraint that isn't just between type variables ('a = 'b).
The complete ED type inference algorithm
Given e,
  1. Produce a proof |- e : tau \ E (such a proof always exists)
  2. Extend E by closing: E := Closure(E).
  3. Check if E is immediately inconsistent; if so, raise typeError
  4. Check for cycles in E using the above algorithm; raise typeError if there is a cycle.
  5. Solve E by the above equation solution algorithm. This algorithm will always terminate if there are no cycles in E.
  6. Output: the solution type tau' for e produced by the solution algorithm.
Fact: The typings produced by the above algorithm are principal.


PED: ED with Let-polymorphism

We still don't have polymorphism, all we have is type variables. Consider
Let x = Function y -> y In x True; x 0
in Caml, this program would type-check: different uses of Function y -> y can have different types. However, consider ED's behavior: expanding the definition of Let, we get
(Function x -> x True; x 0) (Function y -> y)  
|- (Function x -> x True; x 0) : 'a -> 'c \ {'a = Bool -> 'b, 'a = Int -> 'c }
But by the closure we get Int = Bool: BAD!!

The problem in this case:

So we define PED, which is ED with Let and let-polymorphism

The PED Let Typing Rule

We will add Let syntax to PED, and include a special typing rule for Let.
Gamma  |- e : tau  \ E,    Gamma, x : forall 'a1...'an. tau' |- e' : tau'' \ E' , 
----------------------------------------------------------------------------------- (Let)
Gamma  |- Let x = e in e' : tau'' \ E' 
   Where tau' is a solution of |- e : tau \ E using the above algorithm,
   and tau' has free type variables 'a1, .. 'an that do not occur in Gamma.

Type Schema

We also need to add the rule
------------------------------------------------------------ (Let-Inst)
Gamma, x : forall 'a1...'an. tau' |- x : R(tau') \ emptyset

where R(tau') is a renaming of the variables 'a1..'an to fresh names.
Since these names are fresh each time x is used, the different uses won't conflict like above.

Example: lets type the Let version of the program

Let x = Function y -> y In (Function x -> x True; x 0) 
from above.
 |- Function y -> y : 'a -> 'a \ emptyset
This constraint set trivially has the solution type 'a -> 'a. Thus, we then typecheck the Let body under the assumption that x has type forall 'a. 'a -> 'a.
x : forall 'a. 'a -> 'a |- x : 'b -> 'b \ emptyset
by (Let-Inst) and then
x : forall 'a. 'a -> 'a |- x True : 'c \ { 'b -> 'b = Bool -> 'c }
Similarly,
x : forall 'a. 'a -> 'a |- x 0 : 'e \ { 'd -> 'd = Int -> 'e }
The key in the above is this use of x gets a different type variable, 'd, by the (Let-Inst) rule. Putting the two together, the type is something like
x : forall 'a. 'a -> 'a |- x True; x 0 : 'e \ { 'b -> 'b = Bool -> 'c, 'd -> 'd = Int -> 'e }
which by the (Let) rule then produces
|- Let x = Function y -> y In (Function x -> x True; x 0) 
    : 'e \ { 'b -> 'b = Bool -> 'c, 'd -> 'd = Int -> 'e }
Since 'b and 'd are different variables, we don't get the conflict we got previously.


Constrained Type Inference

There was a reason why we presented Hindley-Milner type inference in the form above: if we replace equality constraints by subtyping constraints <:, we can perform constrained type inference. To understand why it is useful to perform this generalization, it is easiest to just look at the rules.

D is not the best system to show off the power of replacing equality with subtyping: since the language does not have records, there is not any interesting subtyping that could happen! To show the usefulness of subtyping, we thus define the constraints in an environment where we have records, DRec. DRec pluc constraints is CDRec. We can contrast CDRec with the EDRec language which we did not study but you could imagine. Instead of types tau \ E for a set of equations E, CDRec has types

tau \ { tau1 <: tau1', .., taun <: taun' }
We will use the letter C to refer to a set of subtyping constraints.

CDRec has the following set of type rules. These are direct generalizations of the ED rules, replacing = by <:. the <: is always in the direction of information flow.

Gamma(x) = tau 
------------------------------- (HYP)
Gamma  |- x : tau \ C


----------------------------- (Int)
Gamma  |- n : Int \ emptyset            for n an integer


----------------------------- (Bool)
Gamma  |- b : Bool \ emptyset           for b either True or False


Gamma  |- e : tau \ C , Gamma  |- e' : tau' \ C'                
------------------------------------------------------- (+)
Gamma  |- e + e' : Int \ C union union C' {tau <: Int, tau' <: Int }


Gamma  |- e : tau \ C  , Gamma  |- e' : tau'  \ C'                 
--------------------------------------- (-)
Gamma  |- e - e' : Int  \ C union union C' {tau <: Int, tau' <: Int }               


Gamma  |- e : tau \ C ,     Gamma  |- e' : tau'  \ C'           
-------------------------------------------- (=)
Gamma  |- e = e' : Bool   \ C  union C' union {tau <: Int, tau' <: Int }             

(And, Or, Not rules should be obvious)

Gamma  |- e : tau \ C ,  Gamma  |- e' : tau' \ C' ,  Gamma  |- e'' : tau'' \ C'' 
------------------------------------------------------------------------ (If)
Gamma  |- If e Then e' Else e'' : 'd \ C union C' union C'' union {tau <: Bool, tau' <: 'd,  tau'' <: 'd}


Gamma  |- e1 : tau1 \ C1, ..., Gamma  |- en : taun \ C2
--------------------------------------------------------------------- (Record)
Gamma  |- { l1 = e1, ..., ln = en  } : { l1 : tau1, ..., ln : taun} \ C1 union C2


Gamma  |- e : tau \ C
-------------------------------------------------------- (Projection)
Gamma  |- l.e : 'a \ C union { tau <: { l : 'a } }


Gamma, x : 'a |- e : tau \ C 
---------------------------------------------------------    (Function)
Gamma  |- Function x  -> e : 'a -> tau \ C 


Gamma, f : 'a -> 'b ,  x : 'a |- e : tau \ C
------------------------------------------------------------    (Rec)
Gamma  |- Rec f = Function x  -> e : 'a -> tau \ C union { tau <: 'b }


Gamma  |- e : tau \ C , Gamma  |- e' : tau'  \ C'
---------------------------------------------------------- (Appl)
Gamma  |- e e' : 'a \ C union C' union { tau <: tau' -> 'a }

These rules almost directly define the type inference procedure: the proof can pretty much be built from the bottom (leaves) on up.


Consider the simple example program
(Function r -> r.l + 1) {l = 5, m = True}
This program will need subtyping because intuitively the function will only need a record with l field, but a record with l,m fields is supplied. The function types as
|- Function r -> r.l + 1 :  'a -> Int \ {'a <: {l: 'b}, 'b <: Int}
and the application then has type
|- (Function r -> r.l + 1) {l = 5, m = True} : 'c \
  {'a <: {l: 'b}, 'b <: Int, 'a -> Int <: {l : Int, m : Bool} -> 'c, Int <: 'c}
From the closure of this constraint set we get the constraints
{l : Int, m : Bool} <: 'a <: {l: 'b}
and so we get
{l : Int, m : Bool} <: {l: 'b}
which is fine since we can always ignore record fields.

The type inference algorithm for constraints is similar to the equational algorithm, but no solution is found for the constraints, for reasons we will see below.


Complete type inference algorithm
Given e,
  1. Produce a proof |- e : tau \ C (recall, such a proof always exists for any e)
  2. Extend C by closing: C := Closure(C).
  3. Check if C is immediately inconsistent; if so, raise typeError
  4. Check for cycles in C as defined below; if cycles found, raise typeError
  5. The inferred type is e : tau \ C
Here are the closure and cycle detection algorithms, the obvious generalizations. Computing Closure(C), the closure of constraint set C
Repeat
Until no more equations can be added to C. A constraint set is immediately inconsistent if tau <: tau' and tau and tau' are different kinds of type (function and record, Int and function, etc), or two records are ordered by <: and the right record has a field the left record does not.


Cycle detection in C Define a directed graph G where nodes are type variables in C. Run an edge from the 'a node to the 'b node if there is an equation 'a <: tau' in C, and 'b occurs in tau'. Run an edge from 'b to 'a if tau' <: 'a occurs in C and 'b occurs in tau'. C has a cycle just when G has a cycle.

Q: Why didn't we solve the constraints??
A: Any substitution proceeds with possible loss of generality. Consider e.g. constraint 'a <: tau, and the possibility of substituting 'a with tau. Well, this precludes the possiblity that the 'a position be a subtype of tau, as the substitution in effect asserted the equality of 'a and tau.

Weakness of constrained types: need to keep constraints around so types are hard to read.


Constrained Polymorphic Types

We have the same shortcomings as in the equational case at this point: there is as of yet no polymorphism. The solution used in the equational case won't work here, as it required the constraints to be solved.

Solution: constrained polymorphic types forall 'a1,...,'an.tau \ C in the assumptions Gamma, in place of the polymorphic types (type schemes) we had in the equational version. Skip the details, they are involved.

Constrained polymorphic types are "very good" object types. Polymorphism needed to type inheritance.


Modules

(goal of covering this topic in the future)

Skipped Topics

We will very briefly cover the polymorphic references issue. In short, it is not possible to have a polymorphic value inside a reference cell. Why not? Consider the program
let val x = ref (Function x -> x) in (x := (Function x: int -> x + 1)); !x true end
Recall the way polymorphic functions were typechecked in ED: x will have the polymorphic type
forall 'a. 'a -> 'a ref
, which means each use of x can pick a different type for 'a. In the above example, the first use of x can pick int for 'a, and the second can pick bool for 'a, and the program will typecheck. OOPS! the program will produce a run-time error!! What Caml does in this case is it does not let the type of x be polymorphic: it is type 'a -> 'a for some PARTICULAR, FIXED 'a, and the program thus does not typecheck.

SML has a special kind of type variable, the imperative type variables of the form '1a. These type variables indicate their values will be placed in cells, so they may only be instantited with monotypes (type variable-free types). For example,

- Function x -> ref x;
val it = Function : '1a -> '1a ref
--since x is placed in a cell, it gets a type '1a. It is polymorphic but in a restricted sense. Now lets try
- it (Function x -> x);
std_in:17.1-17.14 Error: nongeneric weak type variable
  it : ('0Z -> '0Z) ref

-- this attempted to instantiate '1a to be 'a -> 'a, an illegal move since 'a -> 'a contains the type variable 'a. However,
- it 4;
val it = ref 4 : int ref
is perfectly legal since 4 is of type int which contains no type variables.

This issue shows how polymorphism would be very difficult in the C/Scheme language framework, as there all variables may behave as cells. This is perhaps the major reason why polymorphism is a relatively recent phenomenon in programming language design.