We intend to investigate hash-based methods for managing disk-level block placement and its implications for file system design and performance. This investigation will consider both theoretical issues and practical implementations. Hashed placement offers reduced latency variance at the cost of reduced per-file locality, but can be straightforwardly tuned to allow for extent management. In theory, the performance of file systems design to leverage hashed placement should not degrade over time, and the aggregate performance of such placement over multiple request streams should be no worse than that achieved by direct-mapped placement approaches. For some applications, such as isochronous media streams, the performance profile of hashed placement is better than that of direct placement strategies.
Hashed placement allows us to reapportion the responsibilities of current file system designs, moving all responsibility for placement to the logical volume without reducing the run-time disk-level locality. The fundamental advantage of a hashed approach is that it allows a single physical volume to export an arbitrary number of logical data spaces provided that total block consumption does not exceed capacity. By exploiting multiple logical spaces in the file system design, it is possible to completely eliminate indirection blocks (and therefore many sequential write ordering dependencies), and to transparently implement differentiated placement policies at the volume level. This is possible for two reasons. First, the use of multiple data spaces can expose to the volume layer the relationship between blocks and their containing objects, giving the volume layer a small but critical degree of awareness about the higher-level aggregate semantics of the blocks that it manages. Second, multiple hashed placement regimes can be logically multiplexed onto a single physical store without change in performance characteristics, which is not true of direct-mapped placement strategies.
From a theoretical perspective, hashed placement raises issues in modeling, data structure design, and hash function characterization. An effective design must navigate these issues to simultaneously bound placement overflows, prevent denial of service via hostile placements, avoid the need for multiple disk indirections to record block placements, and run in tightly bounded main memory. The placement algorithm must be robust in the face of nonuniform output distribution from the chosen hash function. Further, it must allow for online adaptive rearrangement of storage. Ideally, the last should be accomplished with bounded overhead. Our preliminary investigations into the proposed research suggest that all of these objectives are simultaneously achievable.
The proposed research will proceed by first implementing a hashed volume
manager and testing its performance using conventional file systems and
benchmarks. We will then design and implement a new, object-structured file
system leveraging the multi-space capabilities of the underlying volume. We
hope to show that the resulting new file system is significantly simpler than
current designs and that it exceeds current file system performance in most
respects. All of this work will be supported by simultaneous validation of the
theoretical underpinnings of this work.