A Cost-Benefit Approach to Resource Allocation in Scalable Metacomputers

R. Sean Borgtrom, Johns Hopkins University

A metacomputer is a set of machines networked together for increased computational performance. To build an efficient metacomputer, one must assign jobs to the various networked machines intelligently. A poor job assignment strategy can result in heavily unbalanced loads and thrashing machines. This cripples the cluster’s computational power. A strong job assignment strategy helps a metacomputer complete all of its jobs swiftly.

Resource heterogeneity makes job assignment more complex. Placing a job on one machine might risk depleting its small memory. Another machine might have more free memory but a heavily burdened CPU. Bin packing on memory protects the system against thrashing. Load balancing protects the system against high CPU loads. Combining the two approaches, however, gives an ad hoc heuristic algorithm with no clear theoretical merit.

The Cost-Benefit Framework, developed in this work, offers a new approach to job assignment on metacomputers. It smoothly handles heterogeneous resources by converting them into a unitless cost. We assign (and possibly reassign) jobs to greedily minimize this cost.

This approach gives us an online strategy provably competitive with the optimal offline algorithm in the maximum usage of each resource. It has a weak competitive ratio - logarithmic in the number of machines in the cluster - but even this weak ratio is unprecedented in the literature. No other known method offers any competitive guarantee on more than one resource.

We present experimental evidence that this strategy performs extremely well in practice, comparing it to two important benchmarks: the default round robin strategy of the popular PVM metacomputing system, and the powerful adaptive strategy of the Mosix system.

Metacomputing environments are not homogeneous. In some environments, the scheduler has a great deal of information about submitted jobs. In other cases, it has very little. In some cases, jobs have real-time demands. In some cases, they do not. Some systems can migrate jobs without interrupting their execution. Others cannot. We develop variants of the basic “opportunity cost” strategy of the Cost-Benefit Framework for a number of different metacomputing environments, and prove all of them highly efficient.

Finally, we provide two metacomputing systems - a prototype and a complete system - based on these ideas. The Java Market prototype is a metacomputer built atop Java and web technologies, able to integrate any consenting Internet-connected machine. The Frugal System transforms any Jini network into a metacomputer.