Implementation Issues

Here we briefly survey some key issues in programming language implementation.

Runtime memory models

To review, programming language implementations have three tiers of memory: Additionally, There are also two primary forms of the view given to programmers:

More on implicit memory

Although the programmer can be unaware of it, at runtime the layout is usually different for objects and for primitive data: The Java Integer performs much more slowly than the primitive type int for this reason.

The call stack

Call stack issues relative to Fb and functional programming:

Garbage Collection

Definition. Garbage: something no definately live bit of memory points to
Garbage Collection: automatically find and free this memory. Advantages: Disadvantages:

Reference Counting Collection

Points about reference counting

Mark and sweep collection

Mark and sweep is simplest form of "100% correct" garbage collection.

  1. Use heap as above for keeping free space, allocating new objects space and putting dead objects on heap.
  2. The root set are objects that are directly accessible so have to be live. Standard notion of roots:
    • any object on run-time stack of any thread
    • static fields
  3. Each object has a special mark bit; all bits initially set to "U" (unmarked) by a sweep through all of memory.
  4. To do GC, when space tightens: stop all processing (and, expect to wait awhile - this is the main drawback)
  5. Mark ("M") the root set.
  6. Traverse all pointers from the root set, marking ("M") all objects encountered.
  7. Stop if an object is already marked, no need to traverse again.
  8. When all objects traversed, sweep through all of memory: all objects not marked were inaccessible -- put their space on the free heap.
Main problem is the big wait. Also memory will get very fragmented (eventually will have many very small chunks of free space).

Copying Collection

The garbage collection algorithm.
  1. GC is triggered when the active semi-space becomes too full.
  2. All processing must stop to GC (again that bad point). In particular, a new operation will trigger GC and that new operation will wait until the GC is completed for it to return.
  3. Terminology: newspace is the space that was empty that objects will move to; oldspace is the full space
  4. Like the mark-and-sweep collector, start at the roots in oldspace and recursively follow pointers
  5. But, when a (live) object reached, don't mark it; instead
    1. copy it over to newspace (putting objects in one after the other, so no fragments)
    2. leave behind in its place in oldspace a tombstone, a forwarding pointer to the new location of the object
  6. If during traversal, a tombstone is pointed to, correct the pointer to point to the new location of the object.
  7. invariant: there will never be any newspace objects with pointers to oldspace objects (think about it).
  8. ... after all objects have been reached, the GC is finished, and the active semi-space is switched to newspace and processing resumes.
  9. Fact: at the end, there will be no pointers to tombstones, and so no pointers into oldspace, so its 100% junk.

Disadvantages

  1. The big pause
  2. Wastes half of memory
Still, its much better then mark-and-sweep: no fragmentation, no heap to maintain, no mark bits needed.

Scavenging Collectors

Scavenging is incremental collection: scavenge up a little free space, move around a few objects, let the program run again. Why? Avoid the big bad pause.

Scavenging with a copying collector:

  1. Execution doesn't stop when the collection starts
  2. Critical invariant to preserve: no pointers from newspace to oldspace (without this invariant, at the end of GC there could be pointers to oldspace)
  3. To preserve invariant, if a newspace object gets mutated to point to an oldspace object, immediately pull that oldspace object (and all objects it points to) over to newspace. Requires overhead at each mutation (assignment to instance variable). Also, the worst case is bad (but fortunately almost never happens---memory is random enough)

Sun's HotSpot VM Collector

Sun's hotspot collector (the JVM since Java 3) is a high-quality collector.

Young space

Young space has three subspaces:
  1. Eden
  2. Survivor Space One
  3. Survivor Space Two
The two survivor spaces are identical to the two subspaces of a copying collector: only one is used at any one point. But, they are small so not much memory is wasted.

Copying collection is used on Young space

Old Space

The train garbage collection algorithm is used there:

Efficient Implementation of OCaml

(to appear)

Virtual Machines

JVM and Java


JVM Stack Frames

Whats in a frame?

JVM Bytecodes

Most of the examples below are taken from The JVM Specification. Note: you can view the bytecodes of any .class file by typing javap -c MyClass in a directory where MyClass.class exists.

Simple control and variables

void whileInt() {

        int i = 0;
        while (i < 100) {
            i++;
        }
    }
is compiled to
Method void whileInt()
 0   iconst_0                // push constant 0
 1   istore_1                // store into frame location 1 which is i
 2   goto 8                  // jump to 8th byte
 5   iinc 1 1                // increment frame location 1 by 1
 8   iload_1                 // push load frame location 1 
 9   bipush 100              // push constant 100
 11  if_icmplt 5             // integer less--than compare, going to 5 if true
 14  return

Message Send

    int add12and13() {

        return addTwo(12, 13);
    }
compiles to
Method int add12and13()
 0  aload_0                   // Push this local 0 (this) onto stack 
 1  bipush 12                 // Push int constant 12 onto stack 
 3  bipush 13                 // Push int constant 13 onto stack 
 5  invokevirtual #4          // Method Example.addtwo(II)I 
 8  ireturn                   // Return int on top of stack; it is 
                              // the int result of addTwo()
The #4 in the above refers to the fourth constant in the constant pool. Each class (and thus each .class file) has its constant pool. The constant pool holds So, all the classes/methods/fields etc get listed in the constant pool numbered sequentially. The #4 above is listed in the constant pool as method addtwo of class Example.

Message Receipt with arguments

    int addTwo(int i, int j) {

        return i + j;
    }
compiles to
Method int addTwo(int,int)

 0  iload_1        // Push value of local 1 (i) 
 1  iload_2        // Push value of local 2 (j) 
 2  iadd           // Add; leave int result on val stack 
 3  ireturn        // Return int result

Object Creation

    int i;                                      // An instance variable

    MyObj example() {
        MyObj o = new MyObj();
        return silly(o);
    }
    MyObj silly(MyObj o) {
        if (o != null) {
            return o;
        } else {
            return o;
        }
    }
becomes
Method MyObj example()
  
0    new #2                 // Class MyObj object created; no init yet
3    dup                    // copy top of stack
4    invokespecial #5       // initialize via Method MyObj.<init>()V
7    astore_1               // store into frame location 1, i.e. o
8    aload_0                // load myself
9    aload_1                // load o
10   invokevirtual #4       // invoke Method Example.silly(LMyObj;)LMyObj;
13   areturn


Method MyObj silly(MyObj)

0   aload_1
1   ifnull 6
4   aload_1
5   areturn
6   aload_1
7   areturn

Other Instructions

The following things are explicitly dealt with in bytecodes: Here is a selection of the instructions:
 06 (0x06)       iconst_3       // the int constant 3; have 0-5 built-in
 12 (0x0c)       fconst_1       // floating point constant 1
 16 (0x10)       bipush         // push a 1-byte integer constant (two-byte instruction)
 17 (0x11)       sipush
 29 (0x1d)       iload_3        // push the 3rd local variable; have 0-3
 62 (0x3e)       istore_3       // store tos into 3rd local variable
 87 (0x57)       pop            // good old pop
 089 (0x59)      dup            // duplicate tos
 095 (0x5f)      swap           // swap top two stack elements
 096 (0x60)      iadd           // integer add; many other int, etc ops
 126 (0x7e)      iand           // integer and; and other boolean ops
 134 (0x86)      i2f            // many conversion bytecodes, int to float here
 159 (0x9f)      if_icmpeq      // comparisons
 167 (0xa7)      goto (number)  // jump to number-th byte
 170 (0xaa)      tableswitch    // for switch statements; variable length, followed by jump table
 172 (0xac)      ireturn        // return an integer value
 177 (0xb1)      return         // return an object
 178 (0xb2)      getstatic #5   // get static field thats 5th in constant pool
 180 (0xb4)      getfield #5    // get 5th field
 181 (0xb5)      putfield #5       // put 5th field
 182 (0xb6)      invokevirtual #5  // invoke a method
 183 (0xb7)      invokespecial #5  // invoke a private or constructor
 187 (0xbb)      new #5            // create an object
 188 (0xbc)      newarray       // create an array
 191 (0xbf)      athrow         // throw an exception
 192 (0xc0)      checkcast      // dynamic cast checking
 193 (0xc1)      instanceof     // instance of checking
 194 (0xc2)      monitorenter   // entering synchronized block
 195 (0xc3)      monitorexit    // exiting synchronized block
The whole list is worth eyeballing: see The JVM Spec.

High-level features compiled out

The following features of Java are not present in the bytecodes because they can be expressed with the bytecodes above already:

The JVM Sandbox

The "sandbox" is a set of techniques which keep data from leaking. Java enforces the sandbox through two mechanisms These mechanisms are mutually supporting:

Bytecode Verification

Bytecode Verification is a complex process Here are some specific checks performed on bytecode (note, the compiler can't do this since .class files could be modified)