Seminar Archive – Spring 2012 - Department of Computer Science

Student Seminar

January 21, 2012

Data gathered from multi-month to multi-year battery-powered environmental monitoring sensor networks present numerous challenges. This thesis explores three problems. First, design issues related to loading, storing and data integrity are studied in detail. An end-to-end system addressing these tasks by decoupling deployment-specific and deployment-independent phases is presented. This solution places a strong emphasis on the ability to trace the origins of every collected measurement for provenance and scientific reproducibility. Second, we explore the problem of assigning accurate global timestamps to the measurements collected using the motes’ local clocks. In deployments lacking a persistent gateway, a data-driven approach is employed to assign timestamps to within 10 parts per million. Building on these experiences, we developed a novel low-power approach to accurately timestamp measurements in the presence of random, frequent mote reboots. The system is tested in simulation and on a real deployment in a Brazilian rain forest. It is able to achieve an accuracy in the order of seconds for more than 99% of measurements even when a global clock source is missing for days to months. Lastly, this thesis explores a generic data-driven approach to reduce communication costs in order to increase network lifetime. In particular, spatial correlation among sampling stations is leveraged to adaptively retrieve data from a small subset of informative sensors rather than all instrumented locations. Soil temperature data collected every half hour for four months from 50 locations is used to evaluate this system. The method achieves a factor of two reduction in collected data with a median error of 0.06 C and 95th percentile error of 0.325 C.

This work is part of the Life Under Your Feet project developed at the Hopkins Inter- Networking Research (HiNRG) and eScience Labs at the Johns Hopkins University.

Speaker Biography: Jayant Gupchup received his Bachelors in Computer Engineering from Mumbai University in 2003. From Sep 2003 to July 2005, he worked at the Inter-University Centre for Astronomy and Astrophysics (IUCAA). In Fall of 2005, he began his Ph.D. at the Department of Computer Science at the Johns Hopkins University. His research focusses on data management in long-term environmental monitoring networks, and he is jointly advised by Dr. Andreas Terzis and Prof. Alex Szalay. In 2007, he worked at the Microsoft Bay Area Research Center as a summer intern. He received a masters in Applied Mathematics and Statistics in May 2010 under the supervision of Prof. Carey Priebe. After his Ph.D., he will join the parallel data warehousing team at Microsoft in March 2011.

Student Seminar

January 27, 2012

Scientists are increasingly finding themselves in a paradoxical situation: on a never-ending quest to collect data, they are collecting more data than they can handle. This growth in data comes from three main areas: better instrumentation, improved simulations, and increased data sharing between scientists. The work in this talk describes a variety of techniques to support the exploration and analysis of large scientific datasets, and is drawn from experiences working with two different domain sciences: computational fluid dynamics and estuarine science.

First, we discuss the JHU Turbulence Database Cluster, an environment for the exploration of turbulent flows. We provide a web-service interface for accessing the complete space-time history of a Direct Numerical Simulation. This service gives researchers from around the world the tools needed for spatial and temporal exploration of the simulation. In this talk, we will discuss the overall system design and prototypical applications. We will also discuss the details of implementation, including hierarchical spatial indexing, cache-sensitive spatial scheduling of batch workloads, and localizing computation through data partitioning.

We will also discuss work to improve queries among multiple scientific data sets from the Chesapeake Bay as part of the CBEO project. We developed new data indexing and query processing tools that improve the efficiency of comparing, correlating, and joining data in non-convex regions. We use computational geometry techniques to automatically characterize space from which data are drawn, partition the region based on that characterization, and then create an index from the partitions. In the case of the Chesapeake Bay, our technique ensures that all data from a given tributary (i.e., the Potomac River) will be occupy contiguous regions of the index, which makes the data from these regions contiguous on disk.

Speaker Biography: Eric Perlman received a B.S. in Computer Engineering in 2002 from the University of California, Santa Cruz. He enrolled in the Computer Science Ph.D. program at Johns Hopkins University in 2003. He has worked on large distributed file systems during internships at both IBM Almaden Research Center in 2003 and Google in 2004.

At Johns Hopkins, Eric’s work primarily focused on improving access to large scientific data. He helped build the infrastructure for three interdisciplinary research projects: the JHU Turbulence Database Cluster, the Chesapeake Bay Environmental Observatory Testbed (CBEO:T), and the Open Connectome Project (OCP).

As of December 2012, Eric is working as a Bioinformatics Specialist at the Howard Hughes Medical Institute’s Janelia Farm Research Campus in Ashburn, VA. He is working with Dr. Davi Bock to build a processing pipeline for data captured using high-throughput electron microscopy.

Spring 2012

Data management in environmental monitoring sensor networks Jayant Gupchup, Johns Hopkins University

Indexing and Processing Spatial Range Functions in Data-Intensive Scientific Databases Eric Perlman, Johns Hopkins University

Algorithms for learning latent variable models Daniel Hsu, Microsoft Research

Missing heritability: new statistical and algorithmic approaches Or Zuk, The Broad Institute

Motor, Voters, and the Future of Embedded Security Stephen Checkoway, University of California, San Diego

From Scripts to Programs Matthias Felleisen, Northeastern University

Scalable Database Query Processing Nolan Li, Johns Hopkins University

Securing The Next Generation Web Platform Prateek Saxena, University of California, Berkeley

From Overlays to Clouds: Inventing a New Network Paradigm Yair Amir, Johns Hopkins Univeristy

Machine Learning in the Bandit Setting: Algorithms, Evaluation, and Case Studies Lihong Li, Yahoo Research

Enabling Innovation in Middlebox Deployment Vyas Sekar, Intel Labs

Probabilistic Programming: Beyond Graphical Models David Wingate, MIT

Computational approaches for the DNA sequencing data deluge Ben Langmead, Johns Hopkins University

Efficient Search and Learning for Language Understanding and Translation Liang Huang, Information Sciences Institute/ University of Southern California

Scalable Bayesian learning for complex tensor-valued data Alan Qi, Purdue University

Cybersecurity: How did we get here and how do we get out of here? Carl Landwehr, University of Maryland

Towards Scalable User-Agnostic Attack Defense Zhichun "ZC" Li, NEC Research Labs

Crowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks Omar Zaidan, Johns Hopkins University

Learning to Detect Malcious URLs Justin Ma, University of California, Berkeley

Cloud Data Protection for the Masses Elaine Shi, University of California, Berkeley

Designing a Low-Power Mobile Sensing System for Wireless Healthcare Applications JeongGil Ko, Johns Hopkins University

Machine Learning in the Loop John Langford, Yahoo Research

Machine Learning for Complex Social Processes Hanna Wallach, University of Massachusetts Amherst

Understanding the impact of genetic variation on molecular mechanisms of transcriptional regulation Roger Pique-Regi, University of Chicago

Modeling People from Billions of Photos Ira Kemelmacher-Shlizerman, University of Washington

The Computational Power of Chemical Reaction Networks Rebecca Schulman, Johns Hopkins University

Secure Minimal Architecture for Remote Attestation of Embedded Devices Gene Tsudik

Optimal Coding for Streaming Authentication Ran Gelles, UCLA

Site Menu