Today’s large-scale datasets necessitate scalable data analysis frameworks and libraries. Traditional distributed memory solutions neglect optimizations for prevalent Non-Uniform Memory Access (NUMA) architectures. Additionally, distributed memory solutions often solely rely on process-level parallelism and thus forgo shared and external memory optimizations, leading to suboptimal overall performance. This thesis explores the effects of NUMA-awareness and fine-grain I/O optimizations from SSDs to improve hardware minimality, scalability and memory parallelism in graph analytics and community detection. Our computation optimizations target data that reside either (i) in-memory, (ii) semi-externally, or (iii) in distributed memory. We first present Graphyti, a semi-external memory graph library built on the FlashGraph framework to demonstrate key design principles for vertex-centric, semi-external memory (SEM) graph applications. Graphyti on a single thick node achieves comparable performance to popular distributed graph libraries in a cluster. We then address web-scale community detection and present the clusterNOR framework. We advance the state-of-the-art for memory parallel NUMA computation of k-means and subsequently clustering algorithms that follow the Majorize-Minimization/Minorize-Maximization (MM) objective function optimization pattern. clusterNOR introduces semi-external memory I/O optimizations and cache friendly NUMA scheduling policies for both hierarchical and non-hierarchical clustering algorithms. We demonstrate how these optimizations lead to performance improvements of up to an order magnitude over state-of-the-art clustering frameworks.
Disa Mhembere is a Ph.D. candidate in computer science at the Johns Hopkins University. He received both a masters in engineering management in 2013 and a masters in computer science in 2015 from Johns Hopkins University. During his Ph.D. he interned with IBM Research and Kyndi Inc. Disa was awarded the Paul V. Renoff Computer Science Graduate Fellowship in 2014, the UPE Special Recognition award in 2014 and the UPE Academic Achievement Award in 2017. He also received the best presentation award at the High-Performance Parallel and Distributed Computing (HPDC) Conference in 2017.