PARAID: Adaptive Data Management for Disk Systems

Description:

Current file systems are predicated on a critical design assumption: file system blocks are directly mapped to an underlying volume that presents a dense, sequentially numbered space of blocks. While this assumption is imperfect - zoned disks, striping, and bad block substitution all violate it in some degree - it is the driving characteristic for most file system designs. Several complications result directly from this assumption: (1) filesystems must manage the competing imperatives of rapid block allocation and per-file locality, (2) the data structures that record placement decisions introduce write ordering dependencies and a need for either (a) transactional writes or (b) unreliable heuristics for probablistic recovery after inconsistent shutdown (e.g. fsck), and (3) the success of placement strategies decays with the age of the file system. Based on the current literature, there appears to be no known file system design that simultaneously addresses all of these issues. Based on disk-level traces, the observed run-time locality at the disk level when presented with simultaneous request streams is small, which suggests that we may be designing to the wrong objectives.

We intend to investigate hash-based methods for managing disk-level block placement and its implications for file system design and performance. This investigation will consider both theoretical issues and practical implementations. Hashed placement offers reduced latency variance at the cost of reduced per-file locality, but can be straightforwardly tuned to allow for extent management. In theory, the performance of file systems design to leverage hashed placement should not degrade over time, and the aggregate performance of such placement over multiple request streams should be no worse than that achieved by direct-mapped placement approaches. For some applications, such as isochronous media streams, the performance profile of hashed placement is better than that of direct placement strategies.

Hashed placement allows us to reapportion the responsibilities of current file system designs, moving all responsibility for placement to the logical volume without reducing the run-time disk-level locality. The fundamental advantage of a hashed approach is that it allows a single physical volume to export an arbitrary number of logical data spaces provided that total block consumption does not exceed capacity. By exploiting multiple logical spaces in the file system design, it is possible to completely eliminate indirection blocks (and therefore many sequential write ordering dependencies), and to transparently implement differentiated placement policies at the volume level. This is possible for two reasons. First, the use of multiple data spaces can expose to the volume layer the relationship between blocks and their containing objects, giving the volume layer a small but critical degree of awareness about the higher-level aggregate semantics of the blocks that it manages. Second, multiple hashed placement regimes can be logically multiplexed onto a single physical store without change in performance characteristics, which is not true of direct-mapped placement strategies.

From a theoretical perspective, hashed placement raises issues in modeling, data structure design, and hash function characterization. An effective design must navigate these issues to simultaneously bound placement overflows, prevent denial of service via hostile placements, avoid the need for multiple disk indirections to record block placements, and run in tightly bounded main memory. The placement algorithm must be robust in the face of nonuniform output distribution from the chosen hash function. Further, it must allow for online adaptive rearrangement of storage. Ideally, the last should be accomplished with bounded overhead. Our preliminary investigations into the proposed research suggest that all of these objectives are simultaneously achievable.

The proposed research will proceed by first implementing a hashed volume manager and testing its performance using conventional file systems and benchmarks. We will then design and implement a new, object-structured file system leveraging the multi-space capabilities of the underlying volume. We hope to show that the resulting new file system is significantly simpler than current designs and that it exceeds current file system performance in most respects. All of this work will be supported by simultaneous validation of the theoretical underpinnings of this work.

Faculty members:

PhD students:

Chris Riley

Other students:

Publications

A. Czumaj, C. Riley, and C. Scheideler
Perfectly balanced allocation
In RANDOM 2003.
A. Brinkmann, K. Salzwedel, and C. Scheideler.
Compact, Adaptive Placement Strategies for Non-Uniform Capacities
In Proc. 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 53--62, 2002.
A. Brinkmann, K. Salzwedel, C. Scheideler.
Efficient, Distributed Data Placement Strategies for Storage Area Networks (Extended Abstract)
In Proc. 12th ACM Symposium on Parallel Algorithms and Architectures (SPAA), 2000.

Implementations:

Not yet available...

Christian Scheideler

Last modified: Tue May 20 2003