Lightweight Incremental Query Processing for Update-Intensive Applications

Yanif Ahmad, Cornell University

The dilemma of suffering high ownership costs, or facing limited system scale out present in commercial databases has started a trend where many communities develop their own nimble, lightweight data management tools, as seen with mapreduce and key-value stores. Update-intensive applications, such as algorithmic trading on order books, compute cloud management and personal status feeds (e.g., Facebook, Twitter), are clear cases in point of this trend, since to this date, databases have a notoriously poor reputation for handling updates efficiently. Current techniques such as incremental view maintenance and stream processing either involve significant repetition of work, or apply in a limited setting.

I introduce DBToaster, a novel SQL compilation framework that reconsiders the foundations, and program structure of state-of-the-art query processors to generate lightweight, high-performance query engines. These engines achieve orders of magnitude efficiency gains by fully exploiting queries for incremental processing. DBToaster engines use map data structures instead of highly-optimized relational operators, resulting in very simple, efficient query processing programs. I will present DBToaster’s novel recursive compilation technique, which determines maps to maintain by repeatedly simplifying queries. I will also discuss ongoing work on Cumulus, a massive-scale online query processor based on DBToaster’s extremely simple intermediate language of map maintenance, a language that is embarrassingly parallel and reflects the goals of achieving scalability through simplicity.

Speaker Biography

Yanif Ahmad is a postdoctoral associate in the Database Group at Cornell University with Prof. Christoph Koch, having received his Ph.D. from Brown University in January 2009 under the supervision of Prof. Ugur Cetintemel. His research focuses on data stream processing and distributed data management. Yanif is the recipient of an IBM Ph.D. fellowship, a Best Research Paper award at the ICDE 2008 conference, and a Best Demonstration Award at SIGMOD 2005, and has interned at both IBM Almaden and Microsoft Research.