Spring 2007

February 8, 2007

Since its inception in the 1980s, the popularity of the Internet has been growing exponentially, resulting in a mass of shared knowledge and fast, cheap communication. Hand-in-hand with these developments, we have seen the birth of a plethora of new systems for facilitating interaction among economic agents from marketplaces like eBay and Google’s AdWords to online networking services like MySpace and Match.com. These systems give rise to numerous opportunities for scientific exploration, and such studies are fundamental to the future economic and social success of these systems.

Algorithmic game theory is a new paradigm for studying such systems. The goal in this field is to understand the behavior of autonomous selfish agents, and to define rules that encourage them to collectively act in a way that optimizes some system objective such as social welfare or revenue. In this talk, I will first present an overview of some basic notions of algorithmic game theory and survey some important applications of the field. I will then proceed to showcase some techniques from this field through the problem of online auction design, or auctions for bidders who arrive and depart over time (e.g. Priceline.com). Maximizing welfare in such auctions is complicated by the fact that bids must be accepted or rejected at the moment they are submitted. It is known how the classic secretary problem introduced by Dynkin in 1963 can be used to design approximately welfare-maximizing auctions in a simple multi-unit auction setting. We show how the classic secretary problem can be generalized to a combinatorial setting where acceptable sets form a matroid, and use this generalization to build mechanisms for a class of online combinatorial auctions.

Parts of this talk are based on joint work with Moshe Babaioff and Robert Kleinberg.

Speaker Biography: Nicole Immorlica received her Ph.D. in June 2005 from MIT and has since been a postdoctoral researcher at the Theory Group in Microsoft Research. Her research interests lie in applying techniques from the field of theoretical computer science to problems at the forefront of economics and computer science. Her recent research has focused on auction design, particularly sponsored search auctions; matching markets such as the National Residency Matching Program (NRMP); and the formation and diffusion of information in social networks.

February 12, 2007

The recent release of the first phase of the Haplotype Mapping project (Nature, Oct. 26, 2005 – see also, e.g. NY Times, Oct. 27), and the rapid reduction in genotyping costs, open new directions and opportunities in the study of complex genetic disease such as cancer or Alzheimer’s disease. The advances in genotyping technology set new computational and statistical challenges due to the increased size of the datasets analyzed. In this talk I will discuss some of the computational challenges that are set by these studies, and the current and potential impact of computation on the understanding of complex diseases. In particular, I will describe recent results on genotype phasing, population stratification issues (clustering), and algorithmic issues involved in the study design.

February 15, 2007

Studying the behavior of gene regulatory networks by learning from high-throughput genomic data has become one of the central problems in computational systems biology. Most work in this area has focused on learning structure from data — e.g. finding clusters or modules of potentially co-regulated genes, or building a graph of putative regulatory “edges” between genes — and has been successful at generating qualitative hypotheses about regulatory networks.

Instead of adopting the structure learning viewpoint, our focus is to build predictive models of gene regulation that allow us both to make accurate quantitative predictions on new or held-out experiments (test data) and to capture mechanistic information about transcriptional regulation. Our algorithm, called MEDUSA, integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, the algorithm learns to predict up/down expression of target genes by identifying condition-specific regulators and discovering regulatory motifs that may mediate their regulation of targets. We use boosting, a technique from statistical learning, to help avoid overfitting as the algorithm searches through the high dimensional space of potential regulators and sequence motifs. We will report computational results on the yeast environmental stress response, where MEDUSA achieves high prediction accuracy on held-out experiments and retrieves key stress-related transcriptional regulators, signal transducers, and transcription factor binding sites. We will also describe recent results on the hypoxic response in yeast, where we used MEDUSA to propose the first global model of the oxygen sensing and regulatory network, including new putative context-specific regulators. Through our experimental collaborator on this project, the Zhang Lab at Columbia University, we are in the process of validating our computational predictions with wet lab experiments.

Speaker Biography: Dr. Leslie received her Ph.D. in Mathematics from Berkeley and held an NSERC Postdoctoral Fellowship in the Mathematics Department at Columbia University. She joined the Columbia Computer Science Department in Fall 2000 and moved to the Center for Computational Learning Systems, a new machine learning research center in the School of Engineering at Columbia, in Spring 2004. She is the principal investigator leading the Computational Biology Group (http://www.cs.columbia.edu/compbio) and a faculty member of the Center for Computational Biology and Bioinformatics (C2B2) at Columbia University. Her research lab focuses on the development of machine learning algorithms for studying biological processes at the molecular and systems levels.

February 22, 2007

With increasingly available population-scale genetic variation data, a current high priority research goal is to understand how genetic variations influence complex diseases (or more generally genetic traits). Recombination is an important genetic process that plays a major role in the logic behind association mapping, a currently intensely studied method widely hoped to efficiently find genes (alleles) associated with complex genetic diseases. In this talk, I will present algorithmic and computational results on inferring historical recombination and constructing genealogical networks with recombination and applications to two biologically important problems: association mapping of complex diseases and detecting recombination hotspots. On association mapping, I will present a method that generates the most parsimonious genealogical networks uniformly and show how it can be applied in association mapping. I will introduce results on evaluating how well the inferred genealogy fits the given phenotypes (i.e. cases and controls) and locates genes associated with the disease. Our recent work on detecting recombination hotspots by inferring minimum recombination will also be briefly described. For both biological problems, I will demonstrate the effectiveness of these methods with experimental results on simulated or real data.

Speaker Biography: Yufeng Wu received his Master degree in Computer Science from University of Illinois at Urbana-Champaign in 1998 and Bachelor degree from Tsinghua University, China in 1994. From 1998 to 2003, he was a software engineer at a startup company in Illinois, USA. He is currently a PhD candidate in the Department of Computer Science, University of California, Davis. At UC Davis, he works with Prof. Dan Gusfield on algorithms in computational biology and bioinformatics. His current research is focused on computational problems arising in population genomics.

February 26, 2007

The Internet has emerged the single most important platform for resource sharing among parties with diverse and selfish interests and is the most interesting economics system of our time. As rapidly as new Internet systems are developed and deployed, Internet users find ways to exploit them for their own benefit in ways not intended by the designers. Prevalent undesirable examples of this phenomenon include spam in email systems, bid-sniping and shill bidding in eBay’s auction marketplace, free-loading in file-sharing networks, and click-fraud in Internet advertising. The existence of these economic bugs brings to fore the urgent problem of developing a principled approach to designing systems where exploits are not economically viable. This is the general question addressed by the research area of ‘algorithmic mechanism design’.

Techniques from algorithms are very well suited for addressing both new and traditional economic questions. In this talk I will highlight three of these: computation, reduction, and approximation. I will introduce the research area of ‘algorithmic pricing’ which looks at the question of computing a pricing (e.g., on items in a supermarket) given a known demand (e.g., the preferences of shoppers). I will reduce algorithmic mechanism design, where the demand is unknown, to algorithmic pricing. The reduction I will give is based on random sampling, it preserves approximations, and its analysis makes the connection between mechanism design and machine learning theory concrete. This approach is very generally applicable; however, in this talk I will focus on the economic objective of profit maximization.

(For economists: Motivated by the Wilson docitrine we propose a method for the design and analysis of incentive compatible mechanisms in a prior free environment. Our approach allows for general non-quasi-linear utility functions and optimization of profit or social welfare.)

March 1, 2007

Search engine companies have revolutionized the way we use the Internet. Their dramatic revenues are supported by a different innovation: keyword based advertising. I will talk about this innovation, and show how it has an intimate link with classic matching problems. A new algorithmic technique (of tradeoff revealing family of LP’s) leads to an simple, intuitive (and optimal) algorithm for Adwords allocation.

This work fits within a broader theme of network and Internet economics: The algorithmic study of the economic activities that have emerged on the Internet. I will give an overview of the different research questions and applications in this area. I will also talk about a related question: how techniques from machine learning and game theory can be used in the optimal design of networks.

(This talk is based on three joint works with Deeparnab Chakrabarty, Gagan Goel, Amin Saberi, Umesh Vazirani and Vijay Vazirani)

Speaker Biography: Aranyak Mehta is a postdoctoral researcher at IBM’s Almaden Research Center in San Jose. He received his Ph.D. from Georgia Tech in July 2005. His interests lie in algorithms and their applications, and the interaction with Internet economics and game theory.

Student Seminar

March 2, 2007

The ever increasing complexity of 3D models generated by 3D scanners and through modeling requires ever faster graphics cards to render them at interactive rates. Unfortunately, the rate at which the performance of graphics cards increases is outpaced by the advances in scanning technology, making todays 3D models too large to render interactively. Because of the sheer amount of geometry to render, datasets such as the USGS Earth model are also too large to fit into memory, making the rendering even more challenging. The most common solution to these problems is the use of Level of Detail systems that simplify the original model, removing some of its complexity while maintaining as much of the original detail as possible.

We present a new multiresolution hierarchy and associated algorithms that provide adaptive granularity. This multi-grained hierarchy allows independent control of the number of hierarchy nodes processed on the CPU and the number of triangles to be rendered on the GPU. We employ a seamless texture atlas style of geometry image as a GPU-friendly data organization, enabling efficient rendering and GPU-based stitching of patch borders. We demonstrate our approach on both large triangle meshes and terrains with up to billions of vertices.

The proposed method also takes advantage of the parallelism available in the latest generation of computers. The availability of multi-core, multi-GPU machines allows for significant performance improvements when taken advantage of. In the proposed method adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.

Speaker Biography: Krzysztof Niski received his M.S. from Johns Hopkins University and his B.S. from Shepherd University. His research interests lie in the computer graphics field, focusing on large-scale rendering systems, programmable graphics hardware and scientific data visualization.

March 5, 2007

Gene regulation plays a fundamental role in biological systems. As more high-throughput biological data becomes available it is possible to quantitatively study gene regulation in a systematic way. In this talk I will present my work on three problems related to gene regulation: (1) identifying genes that affect organism development; (2) detecting protein-DNA binding events and cis-regulatory elements; (3) and deciphering regulatory cascades at the transcriptional level for stem cell development. To address these problems, I developed novel nonparametric Bayesian models, Bayesian semi-supervised learning methods, and approximate inference methods for loopy graphs. These methods capture key aspects of biological processes and make functional predictions, some of which were confirmed by biological experiments. I will conclude with a brief description of my plan for future research in computational biology and Bayesian learning.

Speaker Biography: Yuan (Alan) Qi is a postdoctoral associate in MIT Computer Science and Artificial Intelligence Laboratory. He received the Ph.D. degree from the MIT Media Laboratory in February 2005 and the Master degree from the University of Maryland at College Park in June 2000. His research interests include Bayesian machine learning and computational biology.

March 8, 2007

Online Decision-Making is a setting in which an agent must make a sequence of decisions while lacking important knowledge affecting their outcomes. After making each decision, the agent receives feedback about how well it turned out. Our goal is to design algorithms which allows such an agent to adapt to the environment over time, and hopefully converge quickly to the best performance possible. One example is the classical “k-armed bandit” problem, by now fairly well understood, in which the agent must, at each time step, choose one of k options, each of which results in an unknown payout. In this case, a very simple and efficient algorithm is known, based on “multiplicative updates.” One application domain is machine learning, and in fact, the aforementioned algorithm is a key component in the celebrated Adaboost algorithm. Collaborative Filtering is a setting in which many agents try to share information about common interests, for the purpose of later decision-making. As opinions and tastes may vary wildly among these agents, this can be quite difficult. The difficulty is compounded by having only partial and noisy data, and by the likely presence of many adversarial agents, acting on behalf of other interested parties. Recommendation systems, such as those of Netflix and Amazon, or Google’s Pagerank, are examples. Another example is drivers using citizen’s band (CB) radio to warn each other about road conditions, traffic, speed traps, etc. I will describe some of my recent attempts to apply techniques from Online Decision-Making to develop better algorithms for Collaborative Filtering. This area seems ripe for new techniques.

Speaker Biography: Thomas Hayes received his Ph.D. in Computer Science in 2003 at the University of Chicago. He is currently a Research Assistant Professor at the Toyota Technological Institute in Chicago. His interests span discrete mathematics, algorithms, statistical physics, machine learning and probability theory. His most recent focus has been on online optimization, Markov chain Monte Carlo algorithms, and statistical group theory.

March 29, 2007

Real scientific databases, biological databases in particular, can often be very complex and their schemas can comprise thousands of tables, elements, attributes, etc. Any biologist wishing to interact with such a complex database first has the daunting task of understanding the database schema. In this talk, I will propose the concept of schema summary, which can provide a succinct overview of the underlying complex schema and significantly reduce the human effort required to understand the database. I will define criteria for good schema summaries, and describe efficient algorithms for producing them. User effort in locating schema elements needed to construct a structured query can be greatly reduced with a schema summary, which allows the user to explore only portions of the schema that are of interest. Nonetheless, as the query complexity increases, this approach of querying through exploration is no longer a viable option because a significant percentage of the schema will have to be explored. By leveraging schema summary and a novel schema-based semantics for matching meaningful data fragments with structure-free search conditions, I will propose a novel query model called Meaningful Summary Query. The MSQ query model allows the users to query a complex database through its schema summary, with embedded structure-free conditions. As a result, an MSQ query can be generated with the knowledge of the schema summary alone, and yet retrieve highly accurate results from the database.

Speaker Biography: Cong Yu is a Ph.D. Candidate in the Department of EECS (Electrical Engineering and Computer Science), University of Michigan, Ann Arbor. Before that, he obtained his Master and undergraduate degrees in Biology from University of Michigan and Fudan University, respectively. Cong Yu’s research interests are scientific data management, information integration and retrieval, and database query processing. His main research project is Schema Management for Complex and Heterogeneous Information Sources ( http://www.eecs.umich.edu/db/schemasummary/ ), which addresses various issues involved in managing complex, real world, information sources. He is a founding member of the MiMI project ( http://mimi.ncibi.org/), responsible for its overall schema design and the data transformation component. He is also a member of the TIMBER project ( http://www.eecs.umich.edu/db/timber), contributing to its indexing and full-text retrieval components. In his spare time, Cong likes to read books on various topics and is an avid Michigan football fan.

April 19, 2007

Copyright developed in the age of the printing press, and was designed to fit with the system of centralized copying imposed by the printing press. But the copyright system does not fit well with computer networks, and only draconian punishments can enforce it. The global corporations that profit from copyright are lobbying for draconian punishments, and to increase their copyright powers, while suppressing public access to technology. But if we seriously hope to serve the only legitimate purpose of copyright–to promote progress, for the benefit of the public–then we must make changes in the other direction.

Speaker Biography: Richard Stallman launched the development of the GNU operating system (see www.gnu.org) in 1984. GNU is free software: everyone has the freedom to copy it and redistribute it, as well as to make changes either large or small. The GNU/Linux system, basically the GNU operating system with Linux added, is used on tens of millions of computers today. Stallman has received the ACM Grace Hopper Award, a MacArthur Foundation fellowship, the Electronic Frontier Foundation’s Pioneer award and the Takeda Award for Social/Economic Betterment, as well as several honorary doctorates.