Building Hierarchical Storage Services in the Cloud

Object storage has emerged as a low-cost and scalable alternative solution in the cloud for storing unstructured data. However, performance limitations often compel users to employ supplementary storage services for their varied workloads. The result is growing storage sprawl, unacceptably low performance, and an increase in associated storage costs. We combine the assets of multiple cloud service on offer to develop NDStore, a scalable multi-hierarchical data storage deployment for open-science data in the cloud. It utilizes object storage as scalable base tier and an in-memory cluster as a low-latency caching tier to support a variety of workloads. Moreover, many applications that are reliant on richer file system APIs and semantics are unable to benefit from object storage. Users either transfer data between a file system and object storage or use inefficient file connectors over object stores. One promising solution to this problem is providing dual access, the ability to transparently read and write the same data through both, file system interfaces and object storage APIs. We discuss features which we believe are essential or desirable in a dual-access object storage file system—OSFS. Further, we design and implement Agni, an efficient dual-access OSFS, utilizing only standard object storage APIs and capabilities. Generic object storage’s lack of support for partial writes introduces a performance penalty for some workloads. Agni overcomes this shortcomings by implementing a multi-tier data structure which temporally indexes partial writes from the file system, persists them to a log, and merges them asynchronously for eventually consistent access through the object interface. Our experiments demonstrate that Agni improves partial write performance by two to seven times with minimal indexing overhead and for some representative workloads it can improve performance by 20%- 60% compared to S3FS, a popular OSFS, or the prevalent approach of manually copying data between different storage systems.

Speaker Biography

Kunal Lillaney is a Ph.D. candidate in the Department of Computer Science at Johns Hopkins University, working with Randal Burns in the Hopkins Storage Systems Lab (HSSL). His research focuses on enabling big data in the cloud by building hierarchical storage services over object storage. He received his B.Engg degree in Computer Engineering from University of Mumbai in 2011 and his MSE degree in Computer Science from Johns Hopkins University in 2013. During his Ph.D., Kunal has interned with IBM Research-Almaden and Lawrence Livermore National Laboratory. He has also served as the Secretary of the Upsilon Pi Epsilon (UPE)—JHU Chapter between 2015 and 2017, and won the UPE Executive Council Award in 2016.