Implementation Issues
Here we briefly survey some key issues in programming language implementation.Runtime memory models
To review, programming language implementations have three tiers of memory:- Globals (globals of C/C++, static fields and class methods of class-based languages)
- Stack (the runtime function/method call stack where local function variables are allocated)
- Dynamic store (aka the "heap")
--and, the heap can be either explicitly managed via manual alloc/free or be implictly managed by garbage collection.
- Registers are a fourth tier which only C/C++ can partially access via the
registerqualifier on variable declarations. - The cache is even largely invisible at the machine code level, but has a huge impact on performance since access to memory in the cache is an order of magnitude faster than off-cache access.
- Implicit memory: the programmer view is a high-level view where every variable contains a reference to an object, a high-level name-to-value mapping
-- in other words it is how our FbS interpreter viewed things, the memory was just a dictionary. - Explicit memory: references are in fact memory addresses that the programmer can control: address-of an entity is a possible operation, and arrays are just pointers to a block of space.
More on implicit memory
Although the programmer can be unaware of it, at runtime the layout is usually different for objects and for primitive data:- Primitive data (int float, etc) is unboxed, it can directly be on the call stack or in a global location
- Objects are boxed, on the stack there is just a reference to an object.
Integer performs much more slowly than the primitive type int for this reason.
The call stack
- All function/method arguments and local variables are allocated memory on the call stack
- In the implicit memory model, these local variables containing object are just references to the actual object which is on the heap.
- Each function call in particular pushes a new frame on that stack which also stores the return point and maybe saves some register values.
- Call stacks allow recursive calls of a function to have a different copy of the local variables; in languages without recursion there is no strong need for a call stack.
- When a function returns its frame is popped.
- In the explicit memory model it is possible to still have a pointer to a record that was previously popped; this cannot happen in the implicit memory model.
- Variables on the call stack can be quickly access via one memory reference since the stack pointer itself is stored in a register (for unboxed values, boxed values will need one more indirection).
Call stack issues relative to Fb and functional programming:
- In our Fb universe, the function locals were immutable so we in fact substituted instead of putting them on a stack, but this would be very inefficient in practice.
- The reason why we substituted is because nonlocals, variables in a function that are not arguments (and are in fact in frames further up the call stack), are difficult to deal with if there is no stack.
-- a closure is needed for every function definition which remembers the values of all nonlocal variables at the point the function was defined.
Garbage Collection
- Garbage collection: implicit memory management, avoids explicit need
to free
(and subsequent problem of mistakenly freeing something that is in fact still in use). - Found in: Java, Smalltalk, Lisp, Scheme, ML, ...
- Garbage collection is a key component of a modern "high-level language".
Garbage Collection: automatically find and free this memory.
- Many different methods have been proposed.
- All methods require a way to tell where objects start and end, and the difference between a pointer and immediate data (char/smallinteger/boolean/nil).
- No need to explicitly free
- (biggest reason really) no errors where you accidentally freed something that was still in use by someone else
- Slows down computation.
- Can cause a pause when you don't want one--a huge issue for real-time systems programming, and a potentially big issue for any interactive app.
Reference Counting Collection
- Heap-based storage allocation: have a heap containing all the free segments of memory. New objects allocated space from the heap.
- Each memory item maintains a reference count (an extra say 3 bits of data, to store the number of objects immediately pointing to it)
- Reference counts updated every time a pointer moves
- Any time a reference count is decremented to zero, the space is "free" and added to the heap.
- Reference counting rarely used today because of its one major incurable flaw: cycles in memory will not get collected.
- Also, its slow in practice due to the need to maintain the counts.
- RMI objects are reference counted
Mark and sweep collection
Mark and sweep is simplest form of "100% correct" garbage collection.
- Use heap as above for keeping free space, allocating new objects space and putting dead objects on heap.
- The root set are objects that are
directly accessible so have to be live. Standard notion of roots:
- any object on run-time stack of any thread
- static fields
- Each object has a special mark bit; all bits initially set to "U" (unmarked) by a sweep through all of memory.
- To do GC, when space tightens: stop all processing (and, expect to wait awhile - this is the main drawback)
- Mark ("M") the root set.
- Traverse all pointers from the root set, marking ("M") all objects encountered.
- Stop if an object is already marked, no need to traverse again.
- When all objects traversed, sweep through all of memory: all objects not marked were inaccessible -- put their space on the free heap.
Copying Collection
- Allocate memory in two-partition scheme (the two semi-spaces).
- Only one semi-space occupied at any time, other free (wasteful)
- New objects allocated in the occupied partition
- Collector will flip spaces when it garbage collects, making the other space the active one.
- GC is triggered when the active semi-space becomes too full.
- All processing must stop to GC (again that bad point). In
particular, a
newoperation will trigger GC and thatnewoperation will wait until the GC is completed for it to return. - Terminology: newspace is the space that was empty that objects will move to; oldspace is the full space
- Like the mark-and-sweep collector, start at the roots in oldspace and recursively follow pointers
- But, when a (live) object reached, don't mark it; instead
- copy it over to newspace (putting objects in one after the other, so no fragments)
- leave behind in its place in oldspace a tombstone, a forwarding pointer to the new location of the object
- If during traversal, a tombstone is pointed to, correct the pointer to point to the new location of the object.
- invariant: there will never be any newspace objects with pointers to oldspace objects (think about it).
- ... after all objects have been reached, the GC is finished, and the active semi-space is switched to newspace and processing resumes.
- Fact: at the end, there will be no pointers to tombstones, and so no pointers into oldspace, so its 100% junk.
Disadvantages
- The big pause
- Wastes half of memory
Scavenging Collectors
Scavenging is incremental collection: scavenge up a little free space, move around a few objects, let the program run again. Why? Avoid the big bad pause.Scavenging with a copying collector:
- Execution doesn't stop when the collection starts
- Critical invariant to preserve: no pointers from newspace to oldspace (without this invariant, at the end of GC there could be pointers to oldspace)
- To preserve invariant, if a newspace object gets mutated to point to an oldspace object, immediately pull that oldspace object (and all objects it points to) over to newspace. Requires overhead at each mutation (assignment to instance variable). Also, the worst case is bad (but fortunately almost never happens---memory is random enough)
Sun's HotSpot VM Collector
Sun's hotspot collector (the JVM since Java 3) is a high-quality collector.- It performs both mark and sweep, and copying collection.
- Collection is based on how long an object has been alive
- Young objects are in the young generation space
- Older objects are in the old generation spaace
- Why divide like this? Most objects don't live long, so frequently collect young space (minor collection) and infrequently collect old space (major collection).
Young space
Young space has three subspaces:- Eden
- Survivor Space One
- Survivor Space Two
Copying collection is used on Young space
- Objects all are created in eden
- When Eden fills, a minor colection is triggered
- (Statistically, most of the objects in Eden are garbage at this point)
- A copying collection ensues which copies the live objects in Eden and the active survivor space to the other survivor space.
- If an object has been copied between the survivor spaces e.g. three times by three minor collections, it is older and is tenured, copied to old space instead.
- Fact: young spaces are small enough that minor collections produce a pause that is not noticable.
Old Space
- (There is one part of old space called Perm space which holds system objects that will never be collected.)
- Old space collections are major collections; they generate a noticable pause.
- Old space by default uses a mark and sweep collection.
- It also periodically compacts to minimize fragmentation.
- If you run
javawith the-Xincgc(1.4.1 and later), you will get a fancier incremental train collector on oldspace.
- Its a fancy scavenging, compacting mark and sweep collector.
- Objects are clustered by the pointer structure: objects
are close in memory to objects they directly refer to.
(This also helps with virtual memory: objects working with each other are more likely to be in the same page) - In particular, there are trains consisting of a series of cars of clustered objects.
- Trains are generational: they are timestamped upon creation with their number.
- Scavenging: trains are collected one at a time, in particular the oldest remaining train is collected; the time to collect a given train should be small
- In collecting a train, that train goes away; garbage
objects are gone, and non-garbage objects are moved
to the (other) train that directly pointed to them
--this compacts memory and clusters adjacent objects. - There are a huge number of difficult issues to solve which we are going to skip over.
Efficient Implementation of OCaml
(to appear)Virtual Machines
- Nearly all of the implicit memory model languages today compile to a stack-based virtual machine
- Why? One compiler compiling to the VM and you are done, only the VM interpreter needs to be ported to other platforms
- Why not? It slows down performance greatly.
- Modern solution to performance gap: just-in-time compilation of bytecodes to machine code.
- Lecture here: some details on the Java VM (other VMs are similar).
JVM and Java
- The JVM bytecode language isn't Java
- Why Bytecodes? Portability!
- You could write bytecodes by hand and forget Java ever existed
- But, the JVM does share a lot of structure with Java since it was designed for Java
- Built-in JVM concepts include: classes, objects, message send, primitive types, threads, native stacks, exceptions, etc.
JVM Stack Frames
- The JVM uses a stack of frames just like C/C++.
- There is a stack for each thread
- method invocation creates a new frame for that method and pushes it on the stack
- Similarly, method return pops a frame.
- First, its fixed-size: doesn't grow or shrink (bytecode language tells how big to make the frame)
- All local method variables have locations in the frame
- There is an operand stack which is a mini-stack for the JVM instructions which are stack based ("push 3; push 5; add"). More on this below.
JVM Bytecodes
- The JVM is stack machine -- temporaries are not put in registers, they are pushed/popped from the operand stack
- The bytecodes are simple 1-byte instructions 0x00 to 0xFF.
- the instructions generally either work on the operand stack, or with the temp. variables in the stack frame
javap -c MyClass in a directory where MyClass.class exists.
Simple control and variables
void whileInt() {
int i = 0;
while (i < 100) {
i++;
}
}
is compiled to
Method void whileInt() 0 iconst_0 // push constant 0 1 istore_1 // store into frame location 1 which is i 2 goto 8 // jump to 8th byte 5 iinc 1 1 // increment frame location 1 by 1 8 iload_1 // push load frame location 1 9 bipush 100 // push constant 100 11 if_icmplt 5 // integer less--than compare, going to 5 if true 14 return
Message Send
int add12and13() {
return addTwo(12, 13);
}
compiles to
Method int add12and13()
0 aload_0 // Push this local 0 (this) onto stack
1 bipush 12 // Push int constant 12 onto stack
3 bipush 13 // Push int constant 13 onto stack
5 invokevirtual #4 // Method Example.addtwo(II)I
8 ireturn // Return int on top of stack; it is
// the int result of addTwo()
The #4 in the above refers to the fourth constant in the
constant pool. Each class (and thus each .class file) has
its constant pool. The constant pool holds
- String constants
- Array constants
- Int/float/long/double constants
- Class name reference
- Method of a class reference
- Field of a class reference
#4 above is listed in the constant pool as method
addtwo of class Example.
Message Receipt with arguments
int addTwo(int i, int j) {
return i + j;
}
compiles to
Method int addTwo(int,int) 0 iload_1 // Push value of local 1 (i) 1 iload_2 // Push value of local 2 (j) 2 iadd // Add; leave int result on val stack 3 ireturn // Return int result
Object Creation
int i; // An instance variable
MyObj example() {
MyObj o = new MyObj();
return silly(o);
}
MyObj silly(MyObj o) {
if (o != null) {
return o;
} else {
return o;
}
}
becomes
Method MyObj example() 0 new #2 // Class MyObj object created; no init yet 3 dup // copy top of stack 4 invokespecial #5 // initialize via Method MyObj.<init>()V 7 astore_1 // store into frame location 1, i.e. o 8 aload_0 // load myself 9 aload_1 // load o 10 invokevirtual #4 // invoke Method Example.silly(LMyObj;)LMyObj; 13 areturn Method MyObj silly(MyObj) 0 aload_1 1 ifnull 6 4 aload_1 5 areturn 6 aload_1 7 areturn
Other Instructions
The following things are explicitly dealt with in bytecodes:- casting
- switch statements
- array creation
- typecasting
- exception throw and catch
- synchronized methods: monitor entry and exit
- interfaces: they have a run-time presence jsut like classes do.
06 (0x06) iconst_3 // the int constant 3; have 0-5 built-in 12 (0x0c) fconst_1 // floating point constant 1 16 (0x10) bipush // push a 1-byte integer constant (two-byte instruction) 17 (0x11) sipush 29 (0x1d) iload_3 // push the 3rd local variable; have 0-3 62 (0x3e) istore_3 // store tos into 3rd local variable 87 (0x57) pop // good old pop 089 (0x59) dup // duplicate tos 095 (0x5f) swap // swap top two stack elements 096 (0x60) iadd // integer add; many other int, etc ops 126 (0x7e) iand // integer and; and other boolean ops 134 (0x86) i2f // many conversion bytecodes, int to float here 159 (0x9f) if_icmpeq // comparisons 167 (0xa7) goto (number) // jump to number-th byte 170 (0xaa) tableswitch // for switch statements; variable length, followed by jump table 172 (0xac) ireturn // return an integer value 177 (0xb1) return // return an object 178 (0xb2) getstatic #5 // get static field thats 5th in constant pool 180 (0xb4) getfield #5 // get 5th field 181 (0xb5) putfield #5 // put 5th field 182 (0xb6) invokevirtual #5 // invoke a method 183 (0xb7) invokespecial #5 // invoke a private or constructor 187 (0xbb) new #5 // create an object 188 (0xbc) newarray // create an array 191 (0xbf) athrow // throw an exception 192 (0xc0) checkcast // dynamic cast checking 193 (0xc1) instanceof // instance of checking 194 (0xc2) monitorenter // entering synchronized block 195 (0xc3) monitorexit // exiting synchronized blockThe whole list is worth eyeballing: see The JVM Spec.
High-level features compiled out
The following features of Java are not present in the bytecodes because they can be expressed with the bytecodes above already:- Overloading - really have different method names that include the types as part of the "long name"
- Constructors -- explicitly invoke them; bytecode verifier makes sure it happened.
- Expression syntax -- use stack instead
- block-structured commands such as if, while, etc -- use goto and conditional jump
- public-private-protected -- not in bytecodes and bytecode verification makes sure it was not introduced
The JVM Sandbox
The "sandbox" is a set of techniques which keep data from leaking.- The goal of a sandbox is to keep foreign code away from data it shouldn't be getting its hands on.
- A sandbox isn't a single thing; its a set of techniques that enforce a no-leak property.
- C has no sandbox: for p a pointer to the stack, (p-2029)*
inspects the 2029-th previous location on the stack
(and in general, any memory location can be read and written)
- bytecode verification: a series of checks performed on
.classfiles when they are loaded into the JVM to make sure they aren't going to do nasty things, e.g. integers can't be used like they were object pointers. - run-time checks: certain operations, e.g. downcasts and array reference bounds, are checked for legitimacy at run-time
- the first one pretty much rules out all the "bad things", provided the program is not downcasting or accessing arrays out of bounds.
- The second one makes sure those things are not happening by explicitly checking for them.
- It is too hard to check downcasts at compile time (in fact, it
is impossible in many cases e.g. when downcasting
Objectvia reflection).
Bytecode Verification
Bytecode Verification is a complex process- Bytecode verification is performed during .class loading
- It is a check to make sure there are no "type errors" in the bytecodes
- Example error: push a primitive integer, send a message to it
- Such errors generate a runtime exception; since the
javaccompiler doesn't generate bad bytecodes you probably have never seen this runtime error but its there. - Without bytecode verification, programmers could access data that isn't theirs (e.g. applet could grab something important)
- variables must be initialized before they are used.
- a method name invoked should exist in the type of object thats on the top of stack.
- Private/protected object data is not improperly accessed
- Objects are not operated on with primitive operations, and primitive typed entities are not sent messages.
- Parameter types are legit
- Local operand stack never underflows or overflows.