Proxy Caching in Wide-Area Scientific Database Federations
|Introduction||FedCache is a framework designed to bring the advantages of proxy caching, specifically reduced network traffic, improved query response time, and reduced server loads, to real-world scientific database federations. Our system will demonstrate a particular implementation of this technology for the World Wide Telescope (WWT), a large-scale astronomy federation. Not only does FedCache include a practical deployment model, complete with monitoring tools, but it also includes a flexible API to extend support beyond simple composite queries.|
CacheMon Interface Prototype
The performance of caching algorithms depend heavily on the architecture of the host system on which they are implemented. Various system and design issues can reduce the overall performance. In this paper, we describe architectural components of the FedCache system. FedCache integrates the bypass-yield cache framework into the World Wide Telescope, a real-world, scientific database federation. Caching will increase the scale of the federation and allows its benefits to reach a wider community.
Download Report PDF PostScript
Return to top
|CacheMon Interface Prototype||
We are currently building a customized SQL parser for FedCache to aid in making bypass decisions. The grammer was constructed using ANTLR 2.7.5 (http://www.antlr.org) and the driver was developed under Microsoft Visual Studio .Net in C# (Note that the parser is specifically tuned for query logs from the Sloan Digital Sky Survey (SDSS) database, so it may have trouble parsing arbitrary workloads).
Download Source Source Readme
(An older version of the SQL parser, developed using libraries from the Open SkyQuery project, is also available. The major benefit being that it can output in XML. Download Source Readme)
Sloan Digital Sky Survey (SDSS) Workload
We are currently using workloads extracted from query logs on the SDSS database. Two representative query traces from the SkyServerV3 and DR1 versions of the database are included along with their respective table, column, and function metadata (Note that the DR1 trace include function-embedded queries).
Download Trace DR1 (200K+ queries) SkyServerV3 (27K+ queries)
Open SkyQuery portal
A modified version of the Open SkyQuery portal with a semi-functional proxy cache that operates on SDSS at table granularity is available. The caching module is included under "CacheTools" and designed as a proof-of-concept. It demonstrates how bypass decsions are made in the SkyQuery framework but does not actually load and evict multi-gigabyte tables. The distribution is developed as a .Net Web Application and require Internet Information Services (IIS), SQL Server, and .Net framework to function.
Download Portal Source Readme)
The CacheMon web interface prototype is available as a .Net Web Application. Graphs were plotted using the ZedGraph graphing package available at http://zedgraph.sourceforge.net/.
Download Prototype Source
Return to top
FedCache is designed for SkyQuery, a web-based database application for the World Wide Telescope (WWT). In SkyQuery, clients submit queries through an applet-based query interface, which arrives at the Open SkyQuery portal, the mediation middle-ware of WWT. The mediator then divides requests into sub-queries for each site in the database federation. For details on SkyQuery, please refer to the following links:
SkyQuery Project: http://www.skyquery.net/
Open SkyQuery Portal: http://www.openskyquery.org/Sky/skysite/
FedCache integrates the bypass-yield cache framework into the WWT for making query bypass decisions. A bypass-yield cache is altruistic by nature and designed to minimize network traffic in a database federation. For details on bypass caching, please refer to the following paper:
Download Paper PDF PostScript
Return to top
Department of Computer Science
Johns Hopkins University
Baltimore, MD 21218
Return to top