Fall 2000

August 4, 2000

A metacomputer is a set of machines networked together for increased computational performance. To build an efficient metacomputer, one must assign jobs to the various networked machines intelligently. A poor job assignment strategy can result in heavily unbalanced loads and thrashing machines. This cripples the cluster’s computational power. A strong job assignment strategy helps a metacomputer complete all of its jobs swiftly.

Resource heterogeneity makes job assignment more complex. Placing a job on one machine might risk depleting its small memory. Another machine might have more free memory but a heavily burdened CPU. Bin packing on memory protects the system against thrashing. Load balancing protects the system against high CPU loads. Combining the two approaches, however, gives an ad hoc heuristic algorithm with no clear theoretical merit.

The Cost-Benefit Framework, developed in this work, offers a new approach to job assignment on metacomputers. It smoothly handles heterogeneous resources by converting them into a unitless cost. We assign (and possibly reassign) jobs to greedily minimize this cost.

This approach gives us an online strategy provably competitive with the optimal offline algorithm in the maximum usage of each resource. It has a weak competitive ratio – logarithmic in the number of machines in the cluster – but even this weak ratio is unprecedented in the literature. No other known method offers any competitive guarantee on more than one resource.

We present experimental evidence that this strategy performs extremely well in practice, comparing it to two important benchmarks: the default round robin strategy of the popular PVM metacomputing system, and the powerful adaptive strategy of the Mosix system.

Metacomputing environments are not homogeneous. In some environments, the scheduler has a great deal of information about submitted jobs. In other cases, it has very little. In some cases, jobs have real-time demands. In some cases, they do not. Some systems can migrate jobs without interrupting their execution. Others cannot. We develop variants of the basic “opportunity cost” strategy of the Cost-Benefit Framework for a number of different metacomputing environments, and prove all of them highly efficient.

Finally, we provide two metacomputing systems – a prototype and a complete system – based on these ideas. The Java Market prototype is a metacomputer built atop Java and web technologies, able to integrate any consenting Internet-connected machine. The Frugal System transforms any Jini network into a metacomputer.

October 19, 2000

Programs should consist of off-the-shelf, interchangeable, black-box components that are produced by a network of independent software companies. These components should not only come with type signatures but also with contracts that describe other aspects of their behavior.

One way to express contracts is to state pre- and post- conditions for externally visible functions. These pre- and post-conditions should then be validated during evaluation or possibly even during compilation. If a function call fails to satisfy its contract, the run-time system should blame the faulty program component.

Behavioral contracts in the form of assertions are well-understood in the world of procedural languages. Their addition to class and interface hierarchies in object-oriented programming languages, however, raises many new and interesting questions. The most complicating factor is that objects can pass between components and trigger call-backs. Another problem is that object-oriented languages allow objects to satisfy several interfaces at once.

In this talk, I present an analysis of existing approaches to adding contracts to class-based languages and show how they fail to give contracts and contractual violations a rigorous semantics. In particular, they blame the wrong component in certain situations for breach of contract. I then present a conservative extension of Java that allows programmers to specify method contracts in interfaces. The extension is a compromise between a consistent enforcement of contracts and language design concerns.

November 29, 2000

DNA microarrays provide the means to simultaneously measure the expression level of thousands of genes. The immense volume of data resulting from microarray experiments, accompanied by an increase in the amount of related literature, presents a major data analysis challenge. Current analysis methods typically focus on clustering genes based on temporal expression pattern, under the hypothesis that similarly expressed genes share a common function. However, WHAT this function is remains to be explained through further experiments, human expertise and the published literature. An ultimate goal is to complement existing analysis techniques with an automated system for extracting relevant information from the literature.

We present a new approach for utilizing the literature to establish functional relationships among genes on a genome-wide scale. The first part of the talk will introduce a new Expectation-Maximization algorithm, which produces sets of PubMed documents with a unifying theme, along with a list of terms characterizing each theme. The second part presents a method based on this algorithm, which finds content-based relationships among PubMed abstracts, and translates them into functional relationships among genes. Preliminary results, applying this method to a database of documents discussing yeast genes, demonstrate the effectiveness of the approach.

November 30, 2000

We shall describe spaces of images as Grenander deformable templates, orbits under mappings of transformations of prototypes. Both matrix groups from computer vision as well as infinite dimensional diffeomorphisms are examined for Biological shape. Metrics are introduced on the space of images for describing likeness and similarity. The metrics correspond to shortest length paths (geodesics) in the space of mappings. Applications to computational anatomy will be described.