Spring 1999

March 11, 1999

Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the OR-ELSE (Object Relational Extremely Large ScalE) Database Extender, that integrates storage, retrieval and processing of multi-dimensional datasets on scalable architectures. We will address query planning and execution strategies for range queries with user-defined processing. We will also describe operating system and algorithm support, derived from our work on Active Disks architectures, that will allow us to efficiently implement a broad class of decision support databases on inexpensive, highly scalable architectures.

March 25, 1999

We consider several common questions in the design of computer systems: What is a good policy for assigning jobs to hosts in a distributed server? In what order should HTTP requests be scheduled within a Web server? What should the migration policy be in a Network of Workstations? For each problem, we show that the answer depends on the job size distribution, and the impact of the job size distribution is very great, affecting answers sometimes by orders of magnitude. We present measurements showing that job size distributions are commonly heavy-tailed. We show how to incorporate heavy-tailed job size distributions into the design of systems. This leads us to discover solutions to the above questions which are novel and effective.

March 29, 1999

With the advent of new multimedia applications and the surge in Internet growth, networks need to provide strong Quality of Service (QoS) guarantees to end users. Current network policies are sometimes ad hoc, leading to two problems: (1) QoS requirements are not guaranteed, and (2) the bandwidth/cost overhead required to support these policies is huge. To offer guaranteed QoS efficiently in a next generation Internet, we need to rethink and radically redesign many components of the current networking infrastructure, from the network topology down to a network switch. In my talk, I will focus on three key components.

  1. Redesigning high performance switches so they support a variety of application-specific QoS requirements,
  2. Redesigning routing protocols to guarantee end-to-end delays, and
  3. Designing network topology to support bandwidth-intensive applications as cheaply and robustly as possible.

For each of the above problems, I will present a principled, algorithmic approach. For the first two problems, we obtain efficient, deployable solutions. For the last, we obtain interesting theoretical results with possible practical ramifications.

April 1, 1999

Distributed data storage systems play an important role in creating reliable and efficient distributed computing environments, such as the RAIN (Redundant Array of Independent Nodes) at Caltech. This talk will discuss some of the key issues involved in achieving high availability (reliability and efficiency) in distributed storage systems. In particular, I will describe the design and implementation of a novel approach for improving the performance of reliable (n,k) data servers (which are natural generalizations of RAID systems). I will describe a systematic framework for designing such servers, as well as performance evaluation results.

The theory of error-correcting codes serves as the mathematical foundation for creating those novel storage systems. I will describe my recent work on designing new classes of error-correcting codes (one of them is called B-Code) that have efficient encoding/decoding procedures as well as other features that make them suitable for storage systems. The talk will conclude with possible future research directions.

April 2, 1999