Fall 2006

September 21, 2006

Legislators have begun to recognize the importance of how electronically stored data should be maintained and secured. Similarly, the courts have begun to differentiate electronic data from their paper analogs. Examples of some sweeping electronic record management legislation include: the Health Insurance Portability and Accountability Act (HIPAA) of 1996, the Gramm-Leach- Bliley Act (GLBA) of 1999, and the more recent Federal Information Security Management Act (FISMA) and Sarbanes-Oxley Act (SOX) of 2002. Altogether, there exist over 4,000 acts and regulations that govern digital storage, all with a varying range of requirements for maintaining electronic records.

Many current storage solutions fail to meet the new demands legislation placed on storage systems. Systems must now provide confidentiality through encrypted storage and data transmission. Some legislation requires an auditable trail of changes made to electronic records that are accessible in real-time. Other legislation sets limits on the amount of time an organization may be liable for maintaining their electronic data, but for those data that go out of scope, permanently deleting data from magnetic media can be challenging. Because electronic data is dynamic, and therefore easily malleable on disk, new methods for authentication and non- repudiation need to be developed to ensure a binding of an individual to an auditable trail of data changes. Further, these systems must be robust against both external and internal attacks. A data loss or compromise due to negligence may result in an organization falling out of compliance and susceptible to litigation.

We present three technical contributions to the field of regulatory compliant storage. The first is an open-source versioning file system designed to be a platform for developing regulatory compliant storage technologies. We then introduce algorithms and an architecture for the secure deletion of individual versions of a file. Lastly, we construct an audit trail model for a versioning file system so that the changes made to data, and the order in which they occurred, may be verifiable.

September 22, 2006

The success of the human genome project is in no small part due to computer programs, called assemblers, that were used to reconstruct the three billion letters representing our genetic make-up from the small pieces of DNA (usually shorter than one thousand letters) produced by automated sequencing machines. While the human genome is now complete, numerous other genomes remain to be sequenced, providing new challenges to genome assembly algorithms. I will describe some of our recent results and ongoing research projects in the area of genome assembly and provide an overview of the modular assembly package AMOS developed within our group.

Distinguished Lecturer

November 2, 2006

In the context of computer graphics, we explore using hashing to pack sparse data into a compact table while retaining efficient random access. Specifically, we design a perfect multidimensional hash function – one that is precomputed on static data to have no hash collisions. Because our hash function makes a single reference to a small offset table, queries always involve exactly two memory accesses and are thus ideally suited for parallel SIMD evaluation on graphics hardware. Whereas prior hashing work strives for pseudorandom mappings, we instead design the hash function to preserve spatial coherence and thereby improve runtime locality of reference. We demonstrate numerous graphics applications including vector images, texture sprites, alpha channel compression, 3D-parameterized textures, 3D painting, simulation, and collision detection. This is joint work with Sylvain Lefebvre, who is now at INRIA Sophia-Antipolis.

Speaker Biography: Hugues Hoppe is a principal researcher in the Computer Graphics Group of Microsoft Research. His primary interests lie in the multiresolution representation, parameterization, and synthesis of both geometry and textures. He received the 2004 ACM SIGGRAPH Computer Graphics Achievement Award for pioneering work on surface reconstruction, progressive meshes, geometry texturing, and geometry images. His publications include dozens of SIGGRAPH papers, and he is associate editor for ACM Transactions on Graphics. He obtained a BS summa cum laude in electrical engineering in 1989 and a PhD in computer science in 1994 from the University of Washington.

November 3, 2006

The representation of spatial data is an important issue in computer graphics, computer vision, geographic information systems, and robotics. A wide number of representations are currently in use. Recently there has been renewed interest in hierarchical data structures such as quadtrees, octrees, R-trees, and etc. The key advantage of these representations is that they provide a way to index into space. In fact, they are little more than multidimensional sorts. They are compact and depending on the nature of the spatial data they save space as well as time and also facilitate operations such as search. In this talk we give a brief overview of hierarchical spatial data structures and related research results. In addition we demonstrate the SAND Browser (http://www.cs.umd.edu/~brabec/sandjava) and the VASCO JAVA applet (http://www.cs.umd.edu/~hjs/quadtree) which illustrate these methods.

Speaker Biography: Hanan Samet received the B.S. degree in engineering from the University of California, Los Angeles, and the M.S. Degree in operations research and the M.S. and Ph.D. degrees in computer science from Stanford University, Stanford, CA. He is a Fellow of the IEEE, ACM, and IAPR (International Association for Pattern Recognition). In 1975 he joined the Computer Science Department at the University of Maryland, College Park, where he is now a Professor. He is a member of the Computer Vision Laboratory of the Center for Automation Research and also has an appointment in the University of Maryland Institute for Advanced Computer Studies. At the Computer Vision Laboratory he leads a number of research projects on the use of hierarchical data structures for geographic information systems. His research group has developed the QUILT system which is a GIS based on hierarchical spatial data structures such as quadtrees and octrees, the SAND system which integrates spatial and non-spatialdata,theSANDBrowse(http://www.cs.umd.edu/~brabec/sandjava) which enables browsing through a spatial database using a graphical user interface, the VASCO spatial indexing applet (found at http://www.cs.umd.edu/~hjs/quadtree/index.html), and a symbolic image database system. His research interests are data structures, computer graphics, geographic information systems, computer vision, robotics, and database management systems. He is the author of the recent book titled “Foundations of Multidimensional and Metric Data Structures” (http://www.mkp.com/multidimensional) published by Morgan-Kaufmann, an imprint of Elsevier, in 2006, and of the first two books on spatial data structures titled “Design and Analysis of Spatial Data Structures”, and “Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS”, both published by Addison-Wesley in 1990.

Distinguished Lecturer

November 10, 2006

This talk highlights our recent work on the efficiency and security of distributed peer-to-peer data structures, including distributed hash tables (DHTs), rainbow skip graphs, and skip webs. These structures share a common theme in that they assume that data is distributed throughout a peer-to-peer network, for which we wish to build an indexing scheme so that we can locate items of interest quickly, as well as efficiently insert and delete items from the search structure. These structures differ in how they perform this indexing, with DHTs supporting only exact-match queries, rainbow skip graphs supporting one-dimensional range queries, and skip webs supporting multi-dimensional searches. Their efficiency depends on how well they distribute search keys and how well they avoid congestion among concurrent searches. Their security derives from how well they tolerate node losses and/or malicious responses.

Distinguished Lecturer

November 17, 2006

Threats to the availability and security of the Internet have undergone a rapid and dramatic evolution over the past few years. Highly visible attacks against Internet users and infrastructure began only a few short years ago with the emergence of Internet Denial of Service (DoS) attacks and highly virulent Internet worms. Today, we are in the middle of a fundamental shift from DoS attacks and worms that primarily target infrastructure to attacks against the actual enterprises and residential users of the Internet. Spurred by financial rewards, attackers have become proficient at hiding themselves using compromised machines as proxies, amplifying the power of their attacks using distributed software, and targeting their attacks against specific classes of vulnerable systems and users. The result has been a rapid increase in spam, phishing scams, and identity theft that are enabled by vast numbers of compromised computers, or bots, sitting in homes, schools, businesses, and government networks around the world. This presentation discusses the changing Internet ecology and the evolution of zero-day threats. The talk highlights results from the Internet Motion Sensor (IMS) project, a collaborative research project aimed at observing and characterizing security threats on a global scale through the deployment of a set of topology-aware, dark IP network sensors across the Internet. The current IMS deployment has been expanded to 25 organizations in 3 continents, monitoring over 17 million unique IP addresses corresponding to more than 1.25% of all routed IPv4 space. I will also present a new framework for enhancing our ability to detect and mitigate the threat of botnets and other modern attacks. The idea is to leverage knowledge about the physical topology, vulnerability, and exploit spaces of our own networks to construct perspective-aware Internet detection and mitigation systems.

Speaker Biography: Farnam Jahanian is a Professor of Electrical Engineering and Computer Science at the University of Michigan and co-founder of Arbor Networks, Inc. Prior to joining academia in 1993, he was a Research Staff Member at the IBM T.J. Watson Research Center. His research interests include distributed computing, network security, and network protocols and architectures. In the late 90’s, Farnam led a research effort aimed at developing a flow-based system for detecting, backtracing and resolving network-wide anomalies such as DDoS attacks and routing exploits. This research project has formed the basis of a commercial technology that has been widely deployed by more than 100 Internet service providers and numerous mission-critical networks throughout the world. Farnam holds a master’s degree and a Ph.D. in Computer Science from the University of Texas at Austin.

Distinguished Lecturer

December 1, 2006

The first generation of geographic face routing algorithms use planarization techniques that rely on the unit-disk assumption. They can exhibit persistent routing failure when used with real radios, whose connectivity violates that assumption. In this talk, I will start by describing the Cross-Link Detection Protocol (CLDP), which enables provably correct geographic routing on arbitrary graphs. This protocol proactively probes links to determine link-crossings, and can incur moderate overhead. In the second half of my talk, I will describe a lazy cross-link removal technique that addresses this problem incurs two orders of magnitude lower overhead than the best known geographic routing protocol. I will conclude my talk with some observations about the applicability of geographic routing to the Internet. This is joint work with Young-Jin Kim (USC), Brad Karp (Univ. College, London) and Scott Shenker (UC Berkeley).

Speaker Biography: Ramesh Govindan received his B. Tech. degree from the Indian Institute of Technology at Madras, and his M.S. and Ph.D. degrees from the University of California at Berkeley. He is an Associate Professor in the Computer Science Department at the University of Southern California, and a senior researcher at the NSF Center for Embedded Networked Systems. His research interests include scalable routing in internetworks, Internet topology discovery and modeling, and wireless sensor networks.

December 7, 2006

DNA variation that results in a single amino-acid residue change in the protein product of a gene (missense mutant) may have a major impact on an individual’s susceptibility to disease and sensitivity to drugs. Many such variants occur at very low population frequencies, thus case/control and familial cosegregation studies are not sufficiently powered to discriminate between those which are pathogenic/high clinical significance vs. neutral/low clinical significance. A promising alternative approach is to integrate information derived from computational biology with clinical patient data and functional studies. I will describe work that applies protein homology modeling, sequence analysis, and machine learning to predict and rationalize the impact of missense mutations on protein stability and function. These predictions can complement information from patient pedigrees and help make sense of the results of functional assays. I will also discuss how the process can be automated and applied to large-scale datasets.

December 14, 2006

In the future, all telephony services will eventually be delivered over IP due to the low costs and the efficiencies for carriers to maintain a single, unified IP-based telecommunications network. This talk emphasizes the all-IP aspect of wireless and mobile core networks. We first present two platforms, NCTU-SMS and iSMS, that integrate IP networks with mobile short message mechanism. These platforms provide light-weight solutions for quick deployment of added-value data services integrating SMS and IP. Then we introduce mobility and session management evolution from General Packet Radio Service (GPRS) to Universal Mobile Telecommunications System (UMTS). In GPRS, some radio management functions are handled in the core network. These functions have been moved to the radio access network in UMTS. This architectural change results in a clean design that allows radio technology and core network technology to develop independently. We also describe session management functions for IP connection including the Access Point Name (APN) and IP address allocation, tunneling technologies, and QoS management.

Speaker Biography: Yi-Bing Lin is Chair Professor and Vice President (Dean) of Research and Development, National Chiao Tung University. His current research interests include mobile computing and cellular telecommunications services. Dr. Lin has published over 200 journal articles and more than 200 conference papers. He is the co-author of the books Wireless and Mobile Network Architecture (with Imrich Chlamtac; published by Wiley, 2001) and Wireless and Mobile All-IP Networks (with Ai-Chun Pang; published by Wiley, 2005). Dr. Lin is senior technical editor of IEEE Network, an editor of IEEE Trans. on Wireless Communications, IEEE Transactions on Vehicular Technology, and ACM Wireless Networks. Dr. Lin is an ISI highly-cited scholar (among 250 most cited Computer Science researchers worldwide). He is an IEEE Fellow, ACM Fellow, AAAS Fellow, and IEE Fellow.