Adaptive Asynchronous Control and Consistency in Distributed Data Exploration Systems

Benjamin A. Ring, Johns Hopkins University

Advances in machine learning and streaming systems provide a backbone to transform vast arrays of raw data into valuable information. Leveraging distributed execution, analysis engines can process this information effectively within a data exploration workflow to solve problems at unprecedented rates. However, with increased input dimensionality, a desire to simultaneously share and isolate information, as well as overlapping and dependent tasks, this process is becoming increasingly difficult to maintain. These flexible and scalable systems require more robust coordination of distributed execution at a lower level in order to synchronize their efforts as part of a common goal. We argue that an abstraction layer providing adaptive asynchronous control and consistency management over a series of individual tasks coordinated to achieve a global objective can significantly improve data exploration effectiveness and efficiency. We demonstrate this through serverless simulation ensemble management and multi-model machine learning with improved performance and reduced resource utilization. We focus on the specific genres of molecular dynamics and personalized healthcare, however, the contributions are applicable to a wide variety of domains.

In the first part of this talk, we present a novel approach to data exploration centered around a lattice structure which organizes data features to drive input sampling as part of the exploratory process. Integrated with a serverless framework we developed as a data driven in-situ simulation ensemble management system, we couple analysis and simulation inside an HPC cluster to improve rare event detection in molecular dynamics.

In the second part of this talk, we show how to improve machine learning with increased data representation through many, semi-isolated sub-domains. Specifically, we implement asynchronous control over multi-model training in distributed machine learning as applied to healthcare. We address the challenge of integrating both data sharing and data isolation and show how synchronization and adaptive controls are necessary to improve machine learning outcomes.

Speaker Biography

Benjamin A. Ring, Lieutenant Colonel, US Army was commissioned as an Armor Officer in 1996 with a Bachelor in Computer Science from the U.S. Military Academy in West Point, NY. He earned his Masters in Computer Science from Boston University in 2006 and taught at West Point, 2006-2009, receiving his Assistant Professor promotion in 2008. From 2010-2011, he served as Senior Systems Manager for Regional Coalition Forces East in Afghanistan and from 2011-2014, he was the Academic Systems Manager for the US Army Command and General Staff College, Ft. Leavenworth, KS. A member of Upsilon Pi Epsilon and Phi Kappa Phi Honor Societies, Lt. Colonel Ring will be assigned as the Chief Operations Officer, U.S. Army Cyber Protection Brigade, Ft. Gordon, GA starting in September.