Da Zheng
Welcome to Da's homepage.

My name is Da Zheng. I'm a post-doctoral fellow of Computer Science at the Johns Hopkins University, under Prof. Randal Burns at HSSL. My research interest covers multiple areas, ranging from high-performance computing, filesystem, large-scale data analysis systems, and machine learning. I got a PhD from the department of computer science at the Johns Hopkins University.

During my PhD research, I built systems for large-scale data analysis. All of the systems are integrated with each other and forms a single system called FlashX. FlashX is a programming framework for users to express their algorithms to process data in the form of graphs and matrices. As such, users can express graph algorithms and machine learning algorithms in FlashX for large-scale datasets. FlashX is designed for performance, scalability and generality. My dissertation proposal Massive Data Analysis Using Fast I/O covers some subsystems in FlashX and the slides summarize the work.

Projects

FlashGraph

FlashGraph is a semi-external memory graph processing engine, optimized for fast SSDs. FlashGraph achieves performance comparable to state-of-art in-memory graph processing frameworks such as Galois and significantly outperforms distributed memory frameworks such as PowerGraph and GraphX. Furthermore, FlashGraph is able to scale to graphs with billions of vertices and hundreds of billions of edges in a single machine. FlashGraph provides flexible programming interface to help users implement graph algorithms. In FlashGraph, users write serial code that reads data in memory and FlashGraph executes users' code in parallel and out of core. FlashGraph is released as open-source software in Github.

SAFS

The block I/O stack in traditional operating systems is designed for slow magnetic disks. There exist many overheads in almost all layers of the block stack when it operates on fast storage media.

SAFS is a library that implements a filesystem-like interface in the userspace for accessing SSD arrays. It is designed to eliminate overhead in the block stack, especially the overhead from lock contention, without modifying the kernel. It can achieve optimal performance of a large SSD array in a NUMA machine.

The library is released as an open-source software and is released with FlashGraph at Github.

Go to top

Publications

  • Da Zheng, Disa Mhembere, Joshua Vogelstein, Carey E. Priebe, Randal Burns, FlashMatrix: Parallel, Scalable Data Analysis with Generalized Matrix Operations using Commodity SSDs, arXiv:1604.06414, 2016 [pdf] (submitted for review)

  • Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua Vogelstein, Carey E. Priebe, Randal Burns, Semi-External Memory Sparse Matrix Multiplication on Billion-node Graphs in a Multicore Architecture, Transactions on Parallel and Distributed Systems, 2016 [pdf]

  • Da Zheng, Da Zheng, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay, An SSD-based eigensolver for spectral analysis on billion-node graphs, arXiv:1602.01421, 2016 [pdf]

  • Heng Wang, Da Zheng, Randal Burns, Carey Priebe, Active Community Detection in Massive Graphs, SDM-Networks 2015 [pdf]

  • Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey Priebe, Alexander S. Szalay, FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs, in FAST 2015 [pdf] [bib]

  • Da Zheng, Randal Burns, Alexander S. Szalay: Toward Millions of File System IOPS on Low-Cost, Commodity Hardware, in Supercomputing 2013 [pdf] [bib]

  • Da Zheng, Randal Burns, Alexander S. Szalay: A Parallel Page Cache: IOPS and Caching for Multicore Systems, in HotStorage 2012 [pdf] [bib]

  • Da Zheng, Alex. S. Szalay, Andreas. Terzis, Hadoop in Low-Power Processors, arXiv:1408.2284v1

  • Wai-Mee Ching, Da Zheng: Automatic Parallelization of Array-oriented Programs for a Multi-core Machine, in International Journal of Parallel Programming 2012

  • Da Zheng, Anne-Marie Bosneag, Sidath Handurukande, David Cleary: Extending Classic Telecommunication Addressing Schemes for Home Gateway-based User Content and Services Discovery, CNCC 2010.

Go to top