Fall 2008

Distinguished Lecturer

September 18, 2008

For almost a decade, we have worked on MyLifeBits, a project that chronicles a person’s life by encoding every aspect of one’s communications with people and machines, what is heard and seen, and many aspects of their physical existence. Our manifesto: ; the cost to store and maintain such a cyberlife is negligible; an individual data increasingly exists electronically for short and long term use; the value of the data increases more than linearly by being able to relate all items. Three pillars underlie need – supplementing human memory including faithful reproduction of content, freeing an individual of both the atomic and electronic clutter of life’s bits, and providing the potential for a digital immortality. What started as a project for capturing books and papers evolved to art, articles, books, cards, email, letters, memos, papers, photos, posters, and physical objects such as coffee mugs and T-shirt logos. In 2005, it was clear that the system is fundamentally a transaction processing database for all personal interactions with the computer and other devices e.g. SenseCam that captures 1-2 thousand photos a day, and perhaps someday the 2+ billion heart beats.

Speaker Biography: Gordon Bell is a principal researcher with the Microsoft Research Group, San Francisco (1995-) working on a system, MyLifeBits to capture everything in a person’s life. His career includes: vice president of R & D, Digital Equipment Corp. (1960-1983); Professor of Computer Science and electrical engineering, Carnegie-Mellon University (1966-72); founding Assistant Director of the National Science Foundation’s Computing and Information Sciences and Engineering (CISE) Directorate (1986-1988); National Research and Education Network (NREN) panel chair (1987-1988) for creating the internet; advisor/investor in 100+ start-up companies; and a founding trustee of the Computer History Museum, Mountain View, CA. He is a Diamond Exchange Fellow, on TTI Vanguard’s Advisory Board, and the Dept. of Energy’s Advanced Scientific Computing Advisory Committee.

Distinguished Lecturer

September 23, 2008

Many tasks in surface processing presuppose the existence of a “nice” parameterization. One such class of parameterizations are conformal, that is, angle preserving (though they may suffer from large area distortion which has a very satisfying resolution, as the talk will show). Since most applications use triangle meshes rather than the smooth surfaces presupposed by classical differential geometry, one must first come up with suitable discrete definitions, which then hopefully lead to practical algorithms. Guided by the tenets of “Discrete Differential Geometry,” which aims to discretize entire theories rather than just particular equations, I will discuss a notion of discrete conformal equivalence for triangle meshes. It shares many properties with the smooth definition, is very simple, and, most important, leads to very practical algorithms, requiring only standard convex optimization. What’s more, the theory accommodates so called cone singularities (all this will be explained in the talk), which give a powerful tool to reduce the area distortion to moderate levels. I will take the audience through the underlying ideas and present results.

Joint work with Boris Springborn and Ulrich Pinkall (both TU Berlin and Matheon).

Speaker Biography: Peter Schröder is a professor of computer science and applied & computational mathematics at Caltech where he has been on the faculty since 1995. He is well known for his work on hierarchical methods in computer graphics and is considered one of the founders of the field of Digital Geometry Processing. More recently he has devoted his research to the development of Discrete Differential Geometry. His work has been recognized by a number of awards among them a Packard Foundation Fellowship, the ACM SIGGRAPH Computer Graphics Achievement award, and, most recently, a Humboldt Foundation Forschungspreis.

Slides >>

September 25, 2008

The evolution of the Internet will depend heavily on the interaction between what users want and what technology can deliver. Unfortunately the networking community continues to be guided by a collection of misleading dogmas that impede proper direction of research, development, and deployment. The roles of voice communication, of content, and of streaming real-time transmission versus file transfers are widely misunderstood, which leads to plans that are likely to be seriously flawed.

Speaker Biography: Andrew Odlyzko is a Professor in the School of Mathematics at the University of Minnesota. He is engaged in a variety of projects, from mathematics to security and Internet traffic monitoring. His main task currently is to write a book that compares the Internet bubble to the British Railway Mania of the 1840s, and explores the implications for future of technology diffusion. Between 2001 and 2008, he also was at various times the founding director of the interdisciplinary Digital Technology Center, Interim Director of the Minnesota Supercomputing Institute, Assistant Vice President for Research, and held an ADC Professorship, all at the University of Minnesota. Before moving to Minneapolis in 2001, he devoted 26 years to research and research management at Bell Telephone Laboratories, AT&T Bell Labs, and AT&T Labs, as that organization evolved and changed its name. He has written over 150 technical papers in computational complexity, cryptography, number theory, combinatorics, coding theory, analysis, probability theory, and related fields, and has three patents. He has an honorary doctorate from Univ. Marne la Vallee and serves on editorial boards of over 20 technical journals, as well as on several advisory and supervisory bodies.

October 2, 2008

In this talk, we discuss the problem of recovering a non-negative sparse signal x in Re^n from highly corrupted linear measurements y = Ax + e in Re^m, where e is an unknown error vector whose nonzero entries may be unbounded. Motivated by an observation from face recognition in computer vision, we prove that for highly correlated (and possibly overcomplete) dictionaries A, any non-negative, sufficiently sparse signal x can be recovered by solving an L1-minimization problem: min ||x||_1 + ||e||_1 subject to y = Ax + e. More precisely, if the fraction of errors is bounded away from one and the support of x grows sublinearly in the dimension m of the observation, then as m goes to infinity, the above L1-minimization succeeds for all signals x and almost all sign-and-support patterns of e. This result suggests that accurate recovery of sparse signals is possible and computationally feasible even with nearly 100% of the observations corrupted. The proof relies on a careful characterization of the faces of a convex polytope spanned together by the standard cross polytope and a set of iid Gaussian vectors with nonzero mean and small variance, which we call the “cross-and-bouquet” model. Simulations and experimental results corroborate our findings, and suggest intriguing implications and extensions to our results.

Speaker Biography: Yi Ma is an associate professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. His research interests include computer vision and systems theory. Yi Ma received two Bachelors’ degree in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999 and the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He is a senior member of IEEE and a member of ACM.

Distinguished Lecturer

October 6, 2008

The growth of on-line information systems supporting rich forms of social interaction has made it possible to study social network data at unprecedented levels of scale and temporal resolution. This offers an opportunity to address questions at the interface between the theory of computation and the social sciences, where mathematical models and algorithmic styles of thinking can help in formulating models of social processes and in managing complex networks as datasets. We consider two lines of research within this general theme. The first is concerned with modeling the flow of information through a large network: the spread of new ideas, technologies, opinions, fads, and rumors can be viewed as unfolding with the dynamics of epidemic, cascading from one individual to another through the network. This suggests a basis for models of such phenomena, as well as new kinds of open questions. The second line of research we consider is concerned with the privacy implications of large network datasets. An increasing amount of social network research focuses on datasets obtained by measuring the interactions among individuals who have strong expectations of privacy. To preserve privacy in such instances, the datasets are typically anonymized — the names are replaced with meaningless unique identifiers, so that the network structure is maintained while private information has been suppressed. Unfortunately, there are fundamental limitations on the power of network anonymization to preserve privacy; we will discuss some of these limitations and some of their broader implications.

This talk is based on joint work with Lars Backstrom, Cynthia Dwork, and David Liben-Nowell.

Speaker Biography: Jon Kleinberg is a Professor in the Department of Computer Science at Cornell University. His research focuses on issues at the interface of networks and information, with an emphasis on the social and information networks that underpin the Web and other on-line media. He is a Fellow of the American Academy of Arts and Sciences, and the recipient of MacArthur, Packard, and Sloan Foundation Fellowships, the Nevanlinna Prize from the International Mathematical Union, and the National Academy of Sciences Award for Initiatives in Research.

Student Seminar

October 8, 2008

In 2006, America Online accidentally published the web search histories for more than 650,000 of their customers. Although the data was anonymized, simple analysis of the customers’ query patterns revealed deeply private information about their identities and interests. This example illustrates how increasingly dependent we are on third parties who not only provide us with our data, but must be trusted to protect our privacy in the way we access it. In this talk we will discuss new cryptographic techniques for constructing “privacy-preserving” databases that conceal users’ identities and query patterns. In such a database, even the trusted database operator does not learn which records its users access. At the same time, we show how such a database may still enforce sophisticated (and history-dependent) access control policies limiting which records each user may obtain. The techniques we will discuss include two new protocols for efficient, adaptive Oblivious Transfer, as well as new access control mechanisms derived from e-Cash techniques.

Speaker Biography: Matthew Green is a Ph. D. student at Johns Hopkins University, currently completing his dissertation in the field of applied cryptography. His work focuses on privacy-preserving protocols and Identity-Based Encryption.

Student Seminar

October 9, 2008

The Linear Ordering Problem is a hard combinatorial optimization problem with applications in a number of disciplines. In this talk we will describe local search for the Linear Ordering Problem using permutation neighborhoods, and introduce a novel search procedure that improves on the state-of-the-art for greedy local search on a benchmark problem set. We will then move on to the problem of statistical machine translation. We will relate machine translation decoding to difficult combinatorial optimization problems and introduce a new model of word reordering using the linear ordering problem. This will lead to an interesting machine learning problem, which requires adaptation of standard learning algorithms. We will evaluate the learning methods on the task of German-English translation and show that the learned reordering models can help to significantly improve upon the translations of a standard decoder.

Speaker Biography: Roy Tromble is a Ph.D. candidate in the Department of Computer Science and the Center for Language and Speech Processing at Johns Hopkins University. He was born in Juneau, Alaska, graduated from Juneau-Douglas High School in 1997, and received B.S. degrees in Mathematics and Computer Science from the University of Idaho in 2001. He now resides in Pittsburgh where he will work for Google.

Slides >>

October 23, 2008

A significant challenge of autonomous robotics lies in the area of motion planning. The overall objective is to enable robots to automatically plan the low-level motions needed to accomplish assigned high-level tasks. Toward this goal, I developed a novel multi-layered approach, termed Synergic Combination of Layers of Planning (SyCLoP), that synergically combines high-level discrete planning and low-level motion planning. High-level discrete planning, which draws from research in AI and logic, guides low-level motion planning during the search for a solution. A distinctive feature of SyCLOP is that the planning layers work in tandem to evaluate the feasibility of current plans and to compute increasingly feasible plans in future iterations. This synergic combination of high-level discrete planning and low-level motion planning allows SyCLoP to solve motion-planning problems with respect to rich models of the robot and the physical world. This facilitates the design of feedback controllers that enable the robot to execute in the physical world solutions obtained in simulation. In particular, SyCLoP effectively solves challenging motion-planning problems that incorporate discrete and continuous dynamics and physics-based simulations. In addition, SyCLoP can take into account high-level tasks specified using the expressiveness of linear temporal logic (LTL). LTL allows for complex specifications, such as sequencing, coverage, and other combinations of temporal objectives. Experiments show that SyCLoP obtains significant computational speedup of one to two orders of magnitude when compared to state-of-the-art motion planners.

Speaker Biography: Erion Plaku is a postdoctoral fellow in the Laboratory for Computational Sensing and Robotics at Johns Hopkins University. Erion received his Ph.D. in Computer Science from Rice University, Houston, TX in 2008. His research interests encompass robotics, motion planning, data mining, hybrid-system verification, and distributed computing.

Slides >>

October 28, 2008

This talk tells the story of the independent development by the FreeBSD project starting from the open-source release from Berkeley. The FreeBSD project patterned its initial community structure on the development structure built up at Berkeley. It evolved and expanded that structure to create a self-organizing project that supports an ever growing and changing group of developers around the world. This talk concludes with a description of the roles played by the thousands of volunteer developers that make up the FreeBSD Project of today and a discussion of the evolution of its licensing. Bio: Dr. Marshall Kirk McKusick’s work with Unix and BSD development spans nearly thirty years. It begins with his first paper on the implementation of Berkeley Pascal in 1979, goes on to his pioneering work in the eighties on the BSD Fast File System, the BSD virtual memory system, the final release of 4.4BSD-Lite from the UC Berkeley Computer Systems Research Group, and carries on with his work on FreeBSD. A key figure in Unix and BSD development, his experiences chronicle not only the innovative technical achievements but also the interesting personalities and philosophical debates in Unix over the past thirty years.

Speaker Biography: Dr. Marshall Kirk McKusick‘s work with Unix and BSD development spans nearly thirty years. It begins with his first paper on the implementation of Berkeley Pascal in 1979, goes on to his pioneering work in the eighties on the BSD Fast File System, the BSD virtual memory system, the final release of 4.4BSD-Lite from the UC Berkeley Computer Systems Research Group, and carries on with his work on FreeBSD. A key figure in Unix and BSD development, his experiences chronicle not only the innovative technical achievements but also the interesting personalities and philosophical debates in Unix over the past thirty years.

October 30, 2008

Kurzweil says, computers will enable people to live forever and doctors will be doing backup of your memories by late 2030. This talk is not about that, yet. Instead, the remarkable drop in disk costs makes it possible and attractive to capture past application states and store them for a long time. This opens the possibility that features, such as forecasting and auditing formerly relegated to data warehouses and temporal databases, can become available to everyday applications in general databases. The challenge is how to organize past states so that they are “not in the way” and “always there” when needed. Split snapshots are a recent approach to organizing past states that is attractive for several reasons. Split snapshots are transactionally consistent. Unmodified database programs can run against them, side by side with programs running against the current state. They can be taken with any frequency, kept indefinitely, or garbage-collected at low cost, a useful feature in long-lived systems. Several new techniques underly split snapshots. The integration with the buffer manager exploits native recovery and concurrency mechanisms to create consistent copy-on-write snapshots without blocking, a new kind of snapshot index provides fast access independently of snapshot age or update workload, and a new kind of snapshot storage organization manages disk adaptively without copying and without disk fragmentation. Measurements of a prototype system indicate that the approach is efficient and scalable, imposing minimal ($$4\%$$) performance penalty on a storage system, on expected common workloads. (Joint work with Ross Shaull and Hao Xu).

Speaker Biography: Liuba Shrira is an Associate Professor in the Computer Science Department at Brandeis University, and is affiliated with the Computer Science and Artificial Intelligence Laboratory at MIT. From 1986 to 1997 she was a researcher in the MIT/LCS Programming Methodology Group joining Brandeis in 1997. In 2004-2005 she was a visiting researcher at Microsoft Research, Cambridge, UK. Her research interests span aspects of design and implementation of distributed systems and especially storage systems. This includes fault-tolerance, availability and performance issues. Her recent focus is on long-lived transactional storage, time travel (in storage), software upgrades, byzantine faults, and support for collaborative access to long-lived objects.

Slides >>

November 4, 2008

Motion coordination is an extraordinary phenomenon in biological systems and a powerful tool in man-made systems. Although individual agents have no global knowledge of the system, complex behaviors emerge from local interactions. The subject of this talk is the design of motion-enabled sensor networks, i.e., networks where nodal motions are purposefully induced in order to perform useful tasks. Example scenarios include how to deploy sensor nodes in points of interest and how to exploit mobility in target tracking or boundary estimation. The algorithms combine distributed feedback, information processing and geometric structures. The key technical challenge is the design of adaptive behaviors that tolerate asynchronicity and communication/process failures.

Speaker Biography: Francesco Bullo received the Laurea degree “summa cum laude” in Electrical Engineering from the University of Padova in 1994, and the Ph.D. degree in Control and Dynamical Systems from the California Institute of Technology in 1999. From 1998 to 2004, he was an Assistant Professor with the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. He is currently an Associate Professor with the Mechanical Engineering Department at the University of California, Santa Barbara. His students’ papers were finalists for the Best Student Paper Award at the IEEE Conference on Decision and Control (2002, 2005, 2007), and the American Control Conference (2005, 2006). His research interests include motion planning and coordination for autonomous vehicles, and geometric control of mechanical systems.

November 6, 2008

C-arms are mobile x-ray imaging devices frequently used in the operating room (OR). Their maneuverability enables surgeons to acquire images in different angles, and potentially recover three dimensional structures of anatomy. In particular, C-arms can be used for tomographic reconstruction that resembles CT. This use, however, is limited due to multiple factors, including hardware cost and physical constraints on the scan protocol.

In this talk, I present an emerging methodology of fusing intra-operative x-rays with pre-operative anatomical data, which may be based on a patient-specific CT scan or on statistical anatomical models, to compensate for information loss due to the scan constraints. The reconstructed images cover the range between a “best guess” of the 3D structure, a complete cone-beam reconstruction, and a fused volume that’s part one and part the other. Importantly, the method allows us to significantly improve the reconstruction from the observed x-ray images while not losing information in the gap between the x-rays and the prior model.

As a result, we can perform plausible reconstructions using a relatively low-end, ubiquitous C-arm rather than the high-end systems which are currently in clinical use.

The talk surveys various concepts and elements used in the implementation of the method, including image registration, volumetric modeling, visualization algorithms, and calibration of x-ray systems, all required in the final goal which we name “hybrid reconstruction.”

Main collaborators who contributed to this research include: Gouthami Chintalapani, Lotta Ellingsen, Junghoon Lee, Prof. Jerry Prince, Krishna Ramampurthi, and Prof. Russell H. Taylor.

Slides >>

Distinguished Lecturer

November 13, 2008

The rapid evolution of languages, tools, environments and expectations presents major challenges and opportunities for programmers and for software engineering education. This is true across all kinds of programming, but is especially so for Web systems, which are now routinely written in untyped scripting languages and include Ajax, mashups, toolkits, frameworks like Rails and Django, and a profusion of interfaces, all operating asynchronously on distributed systems. For the past 8 or 9 years I have been teaching a course on advanced programming techniques that is more and more stretched between important old material and new unproven material that might be important. In this talk I will illustrate some of the challenges and discuss ways in which we might use complexity and rapid change to advantage.

Speaker Biography: Brian Kernighan received his PhD from Princeton in 1969, and was in the Computing Science Research center at Bell Labs until 2000. He is now a professor in the Computer Science Department at Princeton. His research areas include programming languages, tools, and interfaces that make computers easier to use. He is also interested in technology education for non-technical audiences.

Slides >>

November 18, 2008

The recent growth in both size and speed of FPGAs (Field Programmable Gate Arrays) have opened up tremendous opportunities for using these as spatial computing platforms in the form of hardware accelerators. However, the major obstacle to a wide adoption of these platforms is their programmability. FPGAs have traditionally been programmed using HDLs (Hardware Description Languages) a low-level, tedious and error prone process. HLL (High-level Languages) on the other hand embody a temporal execution paradigm that presupposes a central control, a central storage, control driven sequencing of operations, etc. Translating HLLs into spatial computing structures requires a radical change of the underlying computing model. In this talk I describe our experience with ROCCC (Riverside Optimizing Compiler for Configurable Computing) an on-going research project whose goal is to develop a compiler framework for translating C code to VHDL for the generation of FPGA-based hardware accelerators. ROCCC has been used on a wide variety of applications such as image and video processing, molecular dynamics, bioinformatics, data mining, cryptography, string matching etc. Observed speedup over CPUs range from one to three orders of magnitude. We are currently developing ROCCC 2.0 which builds upon the experience with the first version of ROCCC.

Speaker Biography: Walid A. Najjar is a Professor in the Department of Computer Science and Engineering at the University of California Riverside. His research interests are in the fields of computer architecture, compiler optimizations and embedded systems. Lately, he has been very active in the area of compilation for FPGA-based code acceleration and reconfigurable computing. His research has been supported by NSF, DARPA and various industry sponsors. He received a B.E. in Electrical Engineering from the American University of Beirut in 1979 and the M.S. and Ph.D. in Computer Engineering from the University of Southern California in 1985 and 1988 respectively. He was on the faculty of the Department of Computer Science at Colorado State University (1989 to 2000), before that he was with the USC-Information Sciences Institute (1986 to 1989). He has served on the program committees for a number of leading conferences in this area including CASES, ISSS-CODES, DATE, HPCA, and MICRO. He is a Fellow of the IEEE.

Slides >>

November 20, 2008

Pattern recognition is a very active inter-disciplinary field of computer science, engineering and applied mathematics. It involves the digitization of images and objects, their processing by the computer, extraction of salient features, and sophisticated clustering and classification techniques. This talk summarizes the modern methods involved, computer simulation of human expertise, and their applications to face recognition, palm print recognition, and handwriting recognition. The latest results on handwriting recognition will also be presented.

Speaker Biography: Dr. Ching Suen received his doctoral degree from the University of British Columbia. Currently he is the Director of the Centre for Pattern Recognition and Machine Intelligence and Research Chair in AI and Pattern Recognition at Concordia University in Montreal, Canada. He is the Editor- in-Chief of the journal of Pattern Recognition, and an Associate Editor of several journals including Expert Systems, PRAI, Signal, Video, and Image Processing. He is a fellow of the IEEE, IAPR, and the Academy of Sciences of the Royal Society of Canada. He has guided many doctoral students, post-doctoral fellows, and visiting scientists, and is the author of more than 10 books and 400 technical papers on character recognition, expert systems, and computational linguistics. He is a worldwide consultant and has given many talks at both academic institutions and industrial companies.

Slides >>

December 2, 2008

Recent advances in information technologies have led to unprecedented large amounts of high-dimensional data from many emerging applications. The need for more advanced techniques to analyze such complex data calls for shifting research paradigms. In this talk, I will overview and highlight several results in the area of estimation of mixture models in high-dimensional data spaces. Applications will be presented in problems such as motion segmentation, image segmentation, face recognition, and human action categorization. Through this talk, I intend to emphasize the confluence of algebra and statistics that may lead to more advanced solutions in analyzing complex singular data structures such as mixture linear subspaces and nonlinear manifolds. In the first part of the talk, I will start by reviewing a solution to simultaneously segment and estimate mixture subspace models — Generalized Principal Component Analysis (GPCA). In contrast to traditional statistical methods, GPCA is focused on recovering a set of vanishing polynomials that globally determines mixture subspaces as its zero set. I will introduce a new algebro-geometric framework along the same approach to simultaneously segment mixture quadratic manifolds. The new solution is also robust to moderate data noise and outliers. The second part will be focused on classification of mixture subspace models, where the prior information of mixture subspaces is provided through training examples. Inspired by compressive sensing theory, the recognition problem can be reformulated via a sparse representation. Furthermore, efficient solutions exist to recover such sparse representation using fast L-1 minimization. Finally, I will discuss several open problems in the emerging field of distributed sensor perception.

Speaker Biography: Allen Y. Yang is a postdoctoral researcher in the department of EECS at UC Berkeley. His primary research is in pattern analysis of geometric and statistical models in high-dimensional data spaces, and applications in motion segmentation, image segmentation, face recognition, and distributed sensor perception. He received a BEng in Computer Science from the University of Science and Technology of China (USTC) in 2001. He received an MS in Electrical Engineering in 2003, an MS in Mathematics in 2005, and a PhD in Electrical and Computer Engineering in 2006, all from the University of Illinois (UIUC). He has published five journal papers and more than 10 conference papers. He has three US patent applications.

Slides >>

Distinguished Lecturer

December 4, 2008

For the past couple of decades, the major commercial DBMS vendors have been selling “one size fits all”, i.e. a single system appropriate for all data management tasks. In this talk, we discuss three customized engines, one for data warehousing, one for OLTP and one for scientific data processing. In each case, 1-2 orders of magnitude performance advantage is available from a specialized architecture, and we explore the reasons for these dramatic performance differences. We then speculate on the future of the DBMS market, given this information. In particular, we foresee the end of monolithic DBMS solutions.

Speaker Biography: Computer Science and Artificial Intelligence Laboratory M.I.T