Last modified: Jan, 2001
"C Plus J" Software Architecture (excerpted)
- Author : Lu, Fei
A little history
to start with
The motivation for this software architecture research came out after some deep exploration inside the Java mechanism. Through the investigation, I found that many inborn dynamic overheads in Java cannot be simply reduced by JIT. These overheads can be significant. One example is Java's ubiquitous heap allocation. Because the underlying libraries are all in pure Java, dynamic heap allocations exist everywhere. Each Java object creation will cause two heap allocations: one for reference, the other for data. Actually more space is wasted than allocated if the a Java object is smaller than 40 bytes (as is shown in Figure 1; You can read the source of malloc to understand why). In a more general case, fragments must be considered because of the different size and life cycle of objects. That's why Java programs commonly use 2 - 4 times more memory than they need. In other programming languages, however, such overuse of dynamic heap allocation is rare. Most objects can share their life cycle with functions and with their own member objects; thus they can be created in stack and no additional allocation is needed for member objects - by default they are created together with their owner. Dynamic heap allocation is also slower than stack allocation by two significant orders, for it requires some algorithm to find free blocks. Overhead of other Java dynamic features are a bit harder to simulate and quantify. But they exist during the whole execution.
For efficiency purpose, I started to investigate a new software architecture. My first idea is to support C++, an already
widely used language. The research is quite large, but in an incremental way. I started to develop C+J portable library and then adopted SFI (Software Fault Isolation)
 to guarantee runtime security. In addition to the
previous research on this topic, I presented a new approach to guarantee
Control Flow Isolation. The technique does not rely on code-segment
boundary sandboxing and is crucial for efficient implementation of SFI
on CISC machines.
This C++ portable library is a counterpart of Java packages. C+J classes are divided into several catalogs: cpp::lang, cpp::awt, cpp::io, cpp::net, cpp::util, etc, using namespace. This is called "C+J" Library for its similarity with Java classes in both interface and usage. The library itself is designed to be portable. 80%-90% source code are shared by all platforms. Only some native classes need re-implementation for different OSes. This greatly reduced my work to release both Linux and Windows C+J versions.
C+J Software Architecture for Local Applications
Before we come to the discussion on some advanced topics of C+J Applet, let's first do some experiments
in local applications. The C+J portable library is powerful enough to give us some new experience. C+J applications can now be portable on machine instruction level.
No re-compilation or change is needed to run on a different operating system. This will greatly simplify software distribution and maintenance. For example,
a user can install
several operating systems on a hard-disk without reinstalling applications.
The difference in executable file format can be easily handled using a program launcher. A launcher
is to examine the file format, load and relocate executable modules into its own space and finally passes its control to these modules. This also gives much flexibility
to load and unload modules dynamically. All these operations can be done without the intervention of OS. And programmers have the choice to combine or
during compile-time for their special interests or reasons. My work currently supports ELF relocatable files, since gcc is popular on most platforms. A program launcher
is used to run ELF-format C+J application on Windows.
Distribution Format for C+J Applet
There are many choices. One approach is to deliver directly in machine binaries, like ActiveX. Some machine-independent approaches can also be adopted such as Omniware's bytecode  or tree code like Slim Binaries . Omniware uses RISC-like instruction set and can be efficiently converted to both RISC and CISC native codes. Slim Binaries use syntax-tree based encoding. Programs distributed in Slim Binaries are impressively small in size and can be well optimized into native code. Both Omniware's bytecode and Slim Binaries are used for intermediate purpose. Translation is done before execution, no interpretation or Just-in-time compiling is needed.
For detailed information on the approaches above, see related papers listed at the end of this page.
Secured programming is an essential problem for mobile code and OS extensions. This involves many research areas such as programming language, new compiling techniques, code verification and proof, etc. There have been several approaches proposed in previous researches: type-safe language , interpreted systems , proof-carrying code  and Software Fault Isolation(SFI) [1, 2, 4]. C+J software architecture is mainly based on software fault isolation, because it is language-independent and efficient. But, there was still some limitations in SFI. One important part of my undergraduate project is to extend this technique to a more practical and efficient way.
Allowing distrusted code to run directly on raw machines is obviously unsafe. The danger comes from three categories:
1) Privileged Instructions
2) Memory Access
3) Control Flow
Software fault isolation is to check and insert a few instructions into the distrusted code to block unsafe operations. The code transformed in this way can then safely run on raw machines to achieve maximum performance. Privileged instructions can be checked and rejected statically. Memory access and control flow are protected and restricted within fault isolation domains (ie. data segment and code segment). The approach is based on an assumption that domains are 1-dimensional with upper and lower bounds. The assertion is true for both data segment and code segment in RISC architectures, where SFI was originally designed. However in CISC systems, code segment is actually 2-dimensional. Since their instructions vary in size, starting from a different offset, we will see a totally different instruction sequence in the code segment. If a program uses indirect JMP or RET to reach a wormhole (the middle of some instruction), the control flow can be easily tampered. Once the hidden execution sequence started, it is no longer restrained by bound checking and can directly jump out of the isolated code segment. (see Figure 3)
One possible solution to this problem is presented in MiSFIT  - a software fault isolation tool for x86. An additional function call is inserted before any indirect JMP/CALL(s). The additional call will search the jump-address in a hash table in order to see if it is a valid function entry. MiSFIT also replaces each CALL instruction with a call to a support routine that saves the return address in a separate stack outside the domain. Each RET instruction is replaced with a jump to a second support routine that loads the saved return address and jumps to it. Through this way, programs have no chance to violate the control flow isolation and they are forced to use designated interface to communicate with the outside.
However, compared with the implementation for RISC, the approach used for CISC seems
to have much overhead. And it is also not very convenient to have additional stacks in multi-threaded environment. In C+J Research Project, which is to develop an efficient
and widely acceptable software architecture for executable content such as mobile programs, OS extensions, etc, I proposed a new approach for control flow isolation. In
order to eliminate the dynamic overhead of save-restore and search operations in the hash table, the 2-dimensional code segment should be projected onto a linear space.
For this problem, it is easy for us to know all the safe-entries during compile-time, and only these entries need to be projected. Each safe-entry is assigned with an ID
number and we sandbox IDs instead of real address.
We say an entry is safe if and only if it is a valid return address after a CALL instruction or it is a function entry. Control flow directed toward any safe-entry is guaranteed to fit in physical boundaries and obviously can not jump into the middle of an instruction. Each function is assigned with a Function_ID and each CALL instruction with a Ret_ID. In order to assist checkings for indirect calls, the additional trick I used is to logically combine the 2-dimensional and projected result together. We can insert the corresponding Function_ID in front of each function body and use the following instructions to sandbox indirect CALLs. Through this way, direct CALLs and RETs can also be more efficiently implemented as well and need no additional stacks.
In systems where hardware segment protection is available, memory access sandboxing can be eliminated and control flow isolation can be implemented even faster, thus, the overall overhead is too small to be noticed. In case no hardware acceleration is available, the software-based protection will cost 10%-20% runtime overhead in total, which is still more efficient than other approaches. My additional work on an efficient implementation of Control Flow Isolation has guaranteed the similar performance overhead on all systems.
Global Data and Data Islands:
Libraries are trusted codes, but they must also be responsible for security. As Small, C. and Seltzer, M. pointed out in their paper, libraries should provide a safe interface to distrusted applications . The standard C runtime library is not safely designed in interface. Standard C runtime functions won't do any bound-checking for you and they have no high level security policy controls. Thus, distrusted code can use some certain arguments to violate the security. In addition, high level security policy must be supported to grant privileges such as file or network access to applications. For these reasons, C+J library is also an indispensable security part of C+J software architecture. It provides the same security as Java on class library level, while SFI provides the same security on machine level.
 Michael Franz and Thomas Kistler. "Slim Binaries". In Communications of the ACM, 40(12), pp 87-94. December 1997. Also published as Technical Report No. 96-24, Department of Information and Computer Science, University of California, Irvine, June 1996. [ps or pdf].
 Small, C., Seltzer, M., "MiSFIT: Constructing Safe Extensible Systems".
IEEE Concurrency, 6, 3, July-September 1998, 33-41 (ps)
 Bershad, B.N.; Savage, S.; Pardyak, P.; Sirer, E.G.; Fiuczynski, M.E; Becker, D.; Chambers, C.; Eggers, S.; "Extensibility, safety and performance in the SPIN operating system", ACM Operating Systems Review, vol.29, no.5, p.267-84, Dec, 1995
M. Franz "Code-Generation On-the-Fly: A Key to Portable Software" Ph. thesis, Swiss Federal
Institute of Technology Zurich, 1994. Diss. ETH No. 10497 (ps.gz)
Anurag Acharya, Joel Saltz, "Dynamic Linking for Mobile Programs", In Jan Vitek and Christian
Tschudin, editors, Mobile Object Systems: Towards the Programmable Internet, volume 1222 of Lecture Notes in Computer Science, pages 245-262. Springer-Verlag, 1997. (ps)
Drew Dean, Edward W. Felten, Dan S. Wallach, "Java
Security: From HotJava to Netscape and Beyond", 1996 IEEE Symposium on Security and Privacy (Oakland, California), May 1996. (ps.gz
Sophia Drossopoulou, Susan Eisenbach, "Java is Type Safe - Probably", the 11th European Conference on Object Oriented Programming, June 1997
Selected reading & links:
Linkers and Loaders by John Levine
Low Level Security in Java by Frank Yellin
Java is not type-safe by Vijay Saraswat, AT&T Research
Java's security architecture
(An overview of the JVM's security model and a look at its built-in safety features) - by Bill Venners
Trail: Security in Java 2 SDK 1.2 by Mary Dageforde
Under the Hood: Java's garbage-collected
heap by Bill Venners