Proxy Caching in Wide-Area Scientific Database Federations
Introduction      FedCache is a framework designed to bring the advantages of proxy caching, specifically reduced network traffic, improved query response time, and reduced server loads, to real-world scientific database federations. Our system will demonstrate a particular implementation of this technology for the World Wide Telescope (WWT), a large-scale astronomy federation. Not only does FedCache include a practical deployment model, complete with monitoring tools, but it also includes a flexible API to extend support beyond simple composite queries.
Technical Report
Architectural Components
CacheMon Interface Prototype
Source Code
Related Projects

Technical Report      Abstract
The performance of caching algorithms depend heavily on the architecture of the host system on which they are implemented. Various system and design issues can reduce the overall performance. In this paper, we describe architectural components of the FedCache system. FedCache integrates the bypass-yield cache framework into the World Wide Telescope, a real-world, scientific database federation. Caching will increase the scale of the federation and allows its benefits to reach a wider community.

Download Report            PDF       PostScript

Return to top

Architectural Components     
The individual proxy caches in the FedCache framework intercepts queries between the Open SkyQuery mediator and sites in the database federation, makes the appropiate bypass decisions, and then responds to the client with the results. Three components of this architecture are particularly challenging to implement as they highlight problems that are unique to large scientific database federations.

As the functional interface between the cache and mediator, the API must be both flexible, allowing for easy integration with the database federation, and extensible, permitting support for complex classes of queries in the future. The final system will be capable of running both composite and more complex cross-match queries to illustrate features exposed by the API.

Tables in the WWT federation can be several hundreds of gigabyte in size, which makes transferring cache objects particularly troublesome. The final system will demonstrate several techniques that allows for fast transfer while ensuring that response time and server load are kept within reasonable limits.

Monitoring performance in a database federation requires collecting statistics from all caches in the federation, which can easily become overwhelming. The final system will present a coherent, organized, and concise view of cache state via an interface that administrators can access remotely from any web browser.

Return to top
Architectural components of the FedCache system

CacheMon Interface Prototype     
CacheMon is a interactive, web-based interface for administrators to obtain both a macro and micro view of caching performance in the database federation. We hope to achieve three main goals when developing this novel tool, namely:
  • Monitor individual object performance both inside and outside the caching, such as tracking network savings over time as well as cache resident times
  • Provide a complete picture of the global cache state, such as tracking average query response times, overall network savings, and performance gain over no-caching
  • Control cache behavior by allowing administrators to track system-wide resource allocation patterns and tune the cache accordingly
We currently have a prototype of the CacheMon web interface that demonstrate what the final system might look like. The screenshot to the right shows three categories of performance metrics that can be monitored. At the top, Cache displays both local and global caching performance relative to network savings and bypass decisions. On the bottom, System provides an overview of resource utilization. One important aspect unique to database federations is that CacheMon tracks both local statistics at individual sites along the federation (shown here: SDSS, 2MASS, and USNOB) and aggregate performance.

(View the prototype interface for the FedCache performance monitor at http://dev.openskyquery.org/tmp/CacheMon/)

Return to top
CacheMon prototype

Source Code      Workload Parser
We are currently building a customized SQL parser for FedCache to aid in making bypass decisions. The grammer was constructed using ANTLR 2.7.5 (http://www.antlr.org) and the driver was developed under Microsoft Visual Studio .Net in C# (Note that the parser is specifically tuned for query logs from the Sloan Digital Sky Survey (SDSS) database, so it may have trouble parsing arbitrary workloads).

Download Source            Source       Readme
(An older version of the SQL parser, developed using libraries from the Open SkyQuery project, is also available. The major benefit being that it can output in XML.    Download    Source   Readme)

Sloan Digital Sky Survey (SDSS) Workload
We are currently using workloads extracted from query logs on the SDSS database. Two representative query traces from the SkyServerV3 and DR1 versions of the database are included along with their respective table, column, and function metadata (Note that the DR1 trace include function-embedded queries).

Download Trace            DR1 (200K+ queries)       SkyServerV3 (27K+ queries)

Open SkyQuery portal
A modified version of the Open SkyQuery portal with a semi-functional proxy cache that operates on SDSS at table granularity is available. The caching module is included under "CacheTools" and designed as a proof-of-concept. It demonstrates how bypass decsions are made in the SkyQuery framework but does not actually load and evict multi-gigabyte tables. The distribution is developed as a .Net Web Application and require Internet Information Services (IIS), SQL Server, and .Net framework to function.

Download Portal            Source       Readme)

CacheMon Prototype
The CacheMon web interface prototype is available as a .Net Web Application. Graphs were plotted using the ZedGraph graphing package available at http://zedgraph.sourceforge.net/.

Download Prototype            Source

Return to top

Related Projects      SkyQuery
FedCache is designed for SkyQuery, a web-based database application for the World Wide Telescope (WWT). In SkyQuery, clients submit queries through an applet-based query interface, which arrives at the Open SkyQuery portal, the mediation middle-ware of WWT. The mediator then divides requests into sub-queries for each site in the database federation. For details on SkyQuery, please refer to the following links:

SkyQuery Project:              http://www.skyquery.net/
Open SkyQuery Portal:      

Bypass Caching
FedCache integrates the bypass-yield cache framework into the WWT for making query bypass decisions. A bypass-yield cache is altruistic by nature and designed to minimize network traffic in a database federation. For details on bypass caching, please refer to the following paper:

Download Paper            PDF       PostScript

Return to top

Contact      Tanu Malik         
Xiaodan Wang    
Randal Burns      

Department of Computer Science
Johns Hopkins University
Baltimore, MD 21218

Return to top