Spring 2011

February 10, 2011

We will discuss the role of algorithms and probabilistic methods in combinatorial optimization and public health, with an emphasis on networked phenomena. This includes recent and ongoing research on the Lovasz Local Lemma, the role of networked phenomena in public-health preparedness, and algorithmic issues in wireless networking. Our goal is to articulate the power of algorithms, probabilistic methods, and networked phenomena in scientific, technological, and societal applications.

Speaker Biography: Aravind Srinivasan is a Professor with the Department of Computer Science and the Institute for Advanced Computer Studies at the University of Maryland, College Park. His research interests are in randomized algorithms, networking, social networks, and combinatorial optimization, as well as in the growing confluence of algorithms, networks, and randomness in society, in fields including the social Web and public health. He has published several papers in these areas, in journals including Nature, Journal of the ACM, IEEE/ACM Transactions on Networking, and the SIAM Journal on Computing. He is an editor of four journals, and is an IEEE Fellow.

February 10, 2011

Facebook has grown at an incredible rate in recent years. In October 2007, Facebook had 50 million active users—distinct users who have logged into the website in the past month. 10 months later, the active user account reached 100 million. An additional 9 months after that (April 2009), the active user count climbed to 200 million. In July 2010, Facebook hit 500 million active users.

In this talk, I will discuss how Facebook’s infrastructure has evolved to scale with such explosive growth from the early days of a single cluster to its current architecture that spans across multiple clusters and two coasts. I will explain how our web machines interact with memcache and MySQL and highlight the mechanisms we use to ensure that your data remains consistent within a cluster, across co-located clusters, and across geographically separated clusters.

Speaker Biography: Dr. Harry Li is an infrastructure engineer at Facebook. He is a part of the Datacenters team and works to ensure that Facebook’s architecture continues to scale well with its explosive growth. He received his Ph.D. in 2009 from the University of Texas at Austin under the guidance of Lorenzo Alvisi and Mike Dahlin. Host: Yair Amir, Computer Science.

Note: Harry is visiting us that day and will be available for a few meetings after his talk. E-mail yairamir@cs.jhu.edu if you are interested in meeting with him.

February 15, 2011

We are undergoing a revolution in data. As computer scientists, we have grown accustomed to constant upheaval in computing resources — quicker processors, bigger storage and faster networks — but this century presents the new challenge of almost unlimited access to raw information. Whether from sensor networks, social computing or high-throughput cell biology, we face a deluge of data about our world. We need to parse this information, to understand it, to use it to make better decisions. In this talk, I will discuss my work to confront this new challenge, developing new machine learning algorithms that are based on infinitely-large probabilistic graphical models. In principle, these infinite representations allow us to analyze sophisticated and dynamic phenomena in a way that automatically balances simplicity and complexity — a mathematical Occam’s Razor. Our computers, however, are inevitably finite, so how can we use such tools in practice? I will discuss how my approach leverages ideas from mathematical statistics to develop practical algorithms for inference in infinite models with finite computation. I will discuss how combining a firm theoretical footing with practical computational concerns gives us tools that are useful both within computer science and beyond, in domains such as computer vision, computational neuroscience, biology and the social sciences.

Speaker Biography: Ryan Adams is a Junior Research Fellow in the University of Toronto Department of Computer Science, affiliated with the Canadian Institute for Advanced Research. He received his Ph.D. in Physics from Cambridge University, where he was a Gates Cambridge Scholar under Prof. David MacKay. Ryan grew up in Texas, but completed his undergraduate work in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. He has received several awards for his research, including Best Paper at the 13th International Conference on Artificial Intelligence and Statistics.

February 17, 2011

The graph reachability problem, the computational problem of deciding whether there is a path between two given vertices in a graph, is the canonical problem while studying space bounded computations. Different variations of this problem characterize various important space bounded complexity classes. In particular it is a basic fact that, over directed graphs, this problem completely characterizes non-deterministic logarithmic space bounded computations. Understanding the complexity of the reachability problem is a central concern of computational complexity theory.

In this talk I will first present certain significant open problems regarding the complexity of the reachability problem and then describe the progress we (specifically Chris Bourke, Raghu Tewari, Derrick Stolee and I) have made over the past 5 years at UNL towards solving them.

Speaker Biography: Vinodchandran Variyam is an Associate Professor and Susan Rosowski Professor at the University of Nebraska-Lincoln. Before joining UNL in 2001, he served as a postdoctoral fellow at NEC research Institute, Princeton, and earlier as a Research Assistant Professor at the University of Aarhus, Denmark. He received his PhD from the Institute of Mathematical Sciences, Chennai, India. Prof. Variyam conducts research in computational complexity theory. He has made significant contributions in several topics in complexity theory including space bounded computations, derandomization, circuit complexity, complexity of intermediate problems, Kolmogorov complexity, and average-case complexity.

February 22, 2011

Streaming algorithms is an important area of theoretical computer science with many practical applications. We will define the basic model of data streams and will explain some fundamental algorithms and methods that have shaped the area of data streams. Also, we will survey some of our recent results and discuss current challenges and open problems. In particular, we will present the paper “Zero-One Frequency Laws” (STOC 2010) where we investigate frequency-based functions and answer the main open question of Alon, Matias and Szegedi (STOC 1996).

Speaker Biography: Vladimir Braverman is a Ph.D. candidate at UCLA; his advisor is Rafail Ostrovsky. His main interests are algorithms for data streams, communication complexity and related areas. He received his B.Sc. and M.Sc. degrees from Ben-Gurion University Israel, where his advisor was Daniel Berend. Prior to attending UCLA, he led a research team at HyperRoll, working with Yossi Matias.

February 24, 2011

I will discuss the repeated acquisition of “labels” for data items when the labeling is imperfect. Labels are values provided by humans for specified variables on data items, such as “PG-13” for “Adult Content Rating on this Web Page.” With the increasing popularity of micro-outsourcing systems, such as Amazon’s Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. We present repeated-labeling strategies of increasing complexity, and show several main results: (i) Repeated-labeling can improve label quality and model quality (per unit data-acquisition cost), but not always. (ii) Simple strategies can give considerable advantage, and carefully selecting a chosen set of points for labeling does even better (we present and evaluate several techniques). (iii) Labeler (worker) quality can be estimated on the fly (e.g., to determine compensation, control quality or eliminate Mechanical Turk spammers) and systematic biases can be corrected. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data modelers should have in their repertoire. I illustrate the results with a real-life application from on-line advertising: using Mechanical Turk to help classify web pages as being objectionable to advertisers.

This is joint work with Foster Provost, Victor S. Sheng, and Jing Wang. An earlier version of the work received the Best Paper Award Runner-up at the ACM SIGKDD Conference.

Speaker Biography: Panos Ipeirotis is an Associate Professor at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, with distinction. He has received two “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation.

March 1, 2011

Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. My research aims to discover such representations by learning rich generative models which contain deep hierarchical structure and which support inferences at multiple levels.

In this talk, I will introduce a broad class of probabilistic generative models called Deep Boltzmann Machines (DBMs), and a new algorithm for learning these models that uses variational methods and Markov chain Monte Carlo. I will show that DBMs can learn useful hierarchical representations from large volumes of high-dimensional data, and that they can be successfully applied in many domains, including information retrieval, object recognition, and nonlinear dimensionality reduction. I will then describe a new class of more complex probabilistic graphical models that combine Deep Boltzmann Machines with structured hierarchical Bayesian models. I will show how these models can learn a deep hierarchical structure for sharing knowledge across hundreds of visual categories, which allows accurate learning of novel visual concepts from few examples.

Speaker Biography: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009, and he is now a postdoctoral associate at CSAIL and the Department of Brain and Cognitive Sciences at MIT. His research interests lie in machine learning, computational statistics, and large-scale optimization. He is the recipient of the NSERC Postdoctoral Fellowship and Canada Graduate Scholarship.

March 8, 2011

Graphical models and their logic-based extensions such as Markov logic have become the central paradigm for representation and reasoning in machine learning, artificial intelligence and computer science. They have led to many successful applications in domains such as Bio-informatics, data mining, computer vision, social networks, entity resolution, natural language processing, and hardware and software verification. For them to be useful in these and other future applications, we need access to scalable probabilistic inference systems that are able to accurately analyze and model large amount of data and make future predictions. Unfortunately, this is hard because exact inference crosses the #P boundary and is computationally intractable. Therefore, in practice, one has to resort to approximate algorithms and hope that they are accurate and scalable.

In this talk, I’ll describe my research on scaling up approximate probabilistic inference algorithms to unprecedented levels. These algorithms are based on exploiting structural features present in the problem. I’ll show that all approximate inference schemes developed to date have completely ignored or under-utilized structural features such as context-specific independence, determinism and logical dependencies that are prevalent in many real-world problems. I’ll show how to change this by developing theoretically well-founded structure-aware algorithms that are simple yet very effective in practice. In particular, I’ll present results from the recently held 2010 UAI approximate inference challenge in which my schemes won several categories, outperforming the competition by an order of magnitude on the hardest problems. I’ll conclude by describing exciting opportunities that lie ahead in the area of graphical models and statistical relational learning.

Speaker Biography: Vibhav Gogate completed his Ph.D. from University of California, Irvine in 2009 and is now a Post Doctoral Research Associate at University of Washington. His research interests are in machine learning and artificial intelligence with a focus on graphical models and statistical relational learning. He has authored over 15 papers that have appeared in high-profile conferences and journals and is the co-winner of the 2010 Uncertainty in Artificial Intelligence (UAI) approximate inference challenge.

March 10, 2011

Visualization tools are essential for deriving meaning from the avalanche of data we are generating today. To facilitate an understanding of the complex relationships embedded in this data, visualization research leverages the power of the human perceptual and cognitive systems, encoding meaning through images and enabling exploration through human-computer interactions. In my research I design visualization systems that support exploratory, complex data analysis tasks by biologists who are analyzing large amounts of heterogeneous data. These systems allow users to validate their computational models, to understand their underlying data in detail, and to develop new hypotheses and insights. My research process includes five distinct stages, from targeting a specific group of domain experts and their scientific goals through validating the efficacy of the visualization system. In this talk I’ll describe a user-centered, methodological approach to designing and developing visualization tools and present several successful visualization projects in the areas of genomics and systems biology. I will also discuss the long term implications

Speaker Biography: Miriah is a postdoctoral research fellow in the School of Engineering and Applied Sciences at Harvard University and a visiting scientist at the Broad Institute. She obtained her bachelors degree in astronomy and astrophysics at Penn State University, and earned a PhD in computer science from the University of Utah where she worked in the Scientific Computing and Imaging Institute. Miriah is the recipient of a NSF/CRA Computing Innovation Fellow Award for her work on collaboratively designing visualization tools for biological data. She was also awarded an AAAS Mass Media Fellowship that landed her a stint as a science writer for the Chicago Tribune. At the Broad Institute she is a cofounder of the Data Visualization Initiative.

March 15, 2011

Visualization tools are essential for deriving meaning from the avalanche of data we are generating today. To facilitate an understanding of the complex relationships embedded in this data, visualization research leverages the power of the human perceptual and cognitive systems, encoding meaning through images and enabling exploration through human-computer interactions. In my research I design visualization systems that support exploratory, complex data analysis tasks by biologists who are analyzing large amounts of heterogeneous data. These systems allow users to validate their computational models, to understand their underlying data in detail, and to develop new hypotheses and insights. My research process includes five distinct stages, from targeting a specific group of domain experts and their scientific goals through validating the efficacy of the visualization system. In this talk I’ll describe a user-centered, methodological approach to designing and developing visualization tools and present several successful visualization projects in the areas of genomics and systems biology. I will also discuss the long term implications

Speaker Biography: Risi Kondor obtained his B.A. in Mathematics from the University of Cambridge. After some further studies in Physics and Computational Fluid Dynamics, he changed direction to Machine Learning, getting his M.Sc. from CALD (the precursor to the Machine Learning Department) at Carnegie Mellon University in 2002, and his Ph.D from Tony Jebara’s group at Columbia University in 2007. His first post-doc took him to London, where he worked at the Gatsby Computational Neuroscience at UCL, and he is now completing his second post-doc at the Center for the Mathematics of Information at Caltech.

March 17, 2011

Physiological data are routinely recorded in intensive care, but their use for rapid assessment of illness severity has been limited. The data is high-dimensional, noisy, and changes rapidly; moreover, small changes that occur in a patient’s physiology over long periods of time are difficult to detect, yet can lead to catastrophic outcomes. A physician’s ability to recognize complex patterns across these high-dimensional measurements is limited. We propose a nonparametric Bayesian method for discovering informative representations in such continuous time series that aids both exploratory data analysis and feature construction. When applied to data from premature infants in the neonatal ICU (NICU), our model obtains novel clinical insights. Based on these insights, we devised the Physiscore, a novel risk prediction score that combines patterns from continuous physiological signals to predict infants at risk for developing major complications in the NICU. Using only 3 hours of non-invasive data from birth, Physiscore very successfully predicts morbidity in preterm infants. Physiscore performed consistently better than other neonatal scoring systems, including the Apgar, which is the current standard of care, and SNAP, a machine learning based score that requires multiple invasive tests. This work was recently published on the cover of Science Translational Medicine (Science’s new journal aimed at translational medicine work), and was covered by numerous press sources.

Speaker Biography: Suchi Saria is finishing her PhD in Computer Science at Stanford under Daphne Koller. Her main research interest lies in machine learning and data driven optimizations for health care. Her work has appeared on popular press sources such as CBS Radio, Science NOW, KCBS and The San Francisco chronicle. Her works have also been given a Best Student Paper and a Best Student Paper finalist awards. She is a recipient of the Stanford Graduate Fellowship (SGF) and two Microsoft full-tuition scholarships.

April 5, 2011

What is the total population of the ten largest capitals in the US? Building a system to answer free-form questions such as this requires modeling the deep semantics of language. But to develop practical, scalable systems, we want to avoid the costly manual annotation of these deep semantic structures and instead learn from just surface-level supervision, e.g., question/answer pairs. To this end, we develop a new tree-based semantic representation which has favorable linguistic and computational properties, along with an algorithm that induces this hidden representation. Using our approach, we obtain significantly higher accuracy on the task of question answering compared to existing state-of-the-art methods, despite using less supervision.

Speaker Biography: Percy Liang obtained a B.S. (2004) and an M.S. (2005) from MIT and is now completing his Ph.D. at UC Berkeley with Michael Jordan and Dan Klein. The general theme of his research, which spans machine learning and natural language processing, is learning richly-structured statistical models from limited supervision. He has won a best student paper at the International Conference on Machine Learning in 2008, received the NSF, GAANN, and NDSEG fellowships, and is also a 2010 Siebel Scholar.

April 21, 2011

Stroke is the leading cause of motor disability in the United States, affecting 750,000 people per year. Recent work in animal models suggests that there is a limited window of heightened plasticity in the first four weeks after ischemic injury. Data suggest that there is likely a similar window in humans after stroke, although it may last longer than four weeks but probably not more than 3 months. The argument will be made that we need to find novel interventions that augment and extend this plasticity window because current rehabilitation fails to do so. These interventions could include robotic therapy, virtual reality, videogames, and non-invasive brain stimulation. It is to be hoped that collaborative efforts between computer scientists, video-game designers, roboticists, neuroscientists, and neurologists will help solve a problem that so far has proved intractable.

Speaker Biography: Dr. John Krakauer, is a neurologist and neuroscientist with an interest in the healthy and damaged motor system. He was an Associate Professor of Neurology and co-Director of the Motor Performance Laboratory at Columbia University up until 2010. He is now is the Director of the Center for Motor Learning and Brain Repair at Johns Hopkins University where he studies motor learning and control in patients after stroke and their relationship to functional recovery. There is a critical need to establish whether motor learning itself is affected after stroke and to determine which forms of motor learning should be the focus of rehabilitation strategies. He have made a number of observations/contributions to the study of motor learning in healthy subjects and motor recovery after stroke that suggest new directions for the treatment of impairment early after stroke.

Distinguished Lecturer

April 26, 2011

The Voronoi diagram is a fundamental spatial data structure, used in data analysis and computer graphics to interpret clouds of points as meaningful shapes. An important example is when points are distributed on a lower-dimensional surface in space, for instance on the two-dimensional surface of an object in 3D-space. We will begin by describing how the Voronoi diagram of points on a surface can be used to approximate the surface and its skeleton (aka the medial axis). Then we’ll talk about what is known, and more interestingly unknown, about the complexity of the Voronoi diagram of points on a surface. This will be a somewhat mathematical talk but with a lot of pictures and intuition.

Speaker Biography: Nina Amenta studies geometric algorithms, mostly in computer graphics. She is best-known for her research on surface reconstruction using the Voronoi diagram. She is also interested in biomedical visualization, particularly evolutionary morphology, and in using the GPU for problems involving spatial data. She received her PhD from the University of California at Berkeley in 1993. She was a post-doc at The Geometry Center and at Xerox PARC, and an Assistant Professor at the University of Texas at Austin, before moving to UC Davis in 2003. She is a recipient of an NSF CAREER Award, an Alfred P. Sloan Foundation Fellowship, and a UC Davis Chancellor’s Fellowship.

Distinguished Lecturer

May 3, 2011

Technology, from chatting on the internet to playing video games, has invaded all aspects of our lives and, for better or for worse, is changing who we are. Can we harness technology to effect more changes for the better? Yes we can, and not always in the way one might have expected. In a surprising twist, a mind-numbing activity such as playing action video games appears to lead to a variety of behavioral enhancements in young adults. Action video game players outperform their non-action-game playing peers on various sensory, attentional and cognitive tasks. They search for a target in a cluttered environment more efficiently, are able to track more objects at once, process rapidly fleeting images more accurately and switch between tasks more flexibly. In addition, action gamers manifest a large decrease in reaction time as compared to their non-action-game playing peers across many different tasks, without paying a price in accuracy. A training regimen whose benefits are so broad is unprecedented and provides a unique opportunity to identify factors that underlie generalization of learning and principles of brain plasticity. We propose that a common mechanism is at the source of this wide range of skill improvement. In particular, improvement in performance following action video game play may result from greater attentional control with gamers focusing on signal and ignoring distraction more efficiently. This in turn allows for enhanced integration of information during decision making with action gamers making more informed decision about their environment. We show how these processes may be be implemented by more faithful Bayesian inferences within neural networks consistent with the view that action gamers learn to learn.

Speaker Biography: Daphne Bavelier is Professor of Brain and Cognitive Sciences at the University of Rochester, NY. She is also Co-Director of the Rochester Center for Brain Imaging and Director of the Mind-Space Laboratory. She is a renowned expert in human brain plasticity. Her research combines behavioral and brain imaging approaches to study how humans learn and how the brain adapts to changes in experience, either by nature – for example, deafness – or by training – for example, playing video games. Her laboratory has recently shown that playing certain types of entertainment video games induces a vast array of improvements in perceptual and cognitive abilities that extends well beyond the specific tasks in the game. A training regimen whose benefits are so broad is unprecedented and provides a unique opportunity to identify factors that underlie generalization of learning and principles of brain plasticity. Professor Bavelier entered the Ecole Normale Superieure of Paris, France, in 1985. She received her PhD in Brain and Cognitive Sciences at the Massachusetts Institute of Technology in 1992 and was then a McDonnell-Pew fellow in Cognitive Neuroscience at the Salk Institute, San Diego. She has been on the faculty at Georgetown University, and, since 1999, at the University of Rochester. She was a recipient of the John Merck Scholar Awards in 2000, a 21st Century Award Research Grant by The James S. McDonnell Foundation in 2002, and was selected as a finalist in the Blavatnik Awards for Young Scientists in 2008. She has also been funded by the NIH, the NSF, the Charles A. Dana Foundation and the Packard Foundation over the years. She heads a Mutlidisciplinary Research Initiative (MURI) sponsored by the Office of Naval Research that studies complex learning and skill transfer with video games.

Slides >>

May 5, 2011

Humans have the amazing ability to walk with ease, navigating from daily environments to uneven and uncertain terrain with efficiency and robustness. If these same abilities could be imbued into robotic devices, the potential applications are far-reaching: from legged robots for space exploration to the next generation of prosthetic devices. Unfortunately the robotic bipedal walking community has been unable to agree on an appropriate model to generate anthropomorphic gait.

In this talk, I begin by describing how the sequence of contact point enforcements or discrete events along with a Lagrangian that is intrinsic to a biped completely determines the mathematical model for a biped. Given this insight, in the first part of the talk I describe a nine subject straight line walking motion capture experiment and two methods to extract temporal orderings: function fitting and persistent homology. Surprisingly the result of either method is that all participants regardless of age, height, sex, or weight had an identical temporal ordering of such events. In the second part of the talk, I describe how to generalize the detection of constraint enforcement by recasting the problem as an optimal control problem of switched dynamical systems. Given this result, we construct a numerical solution to the optimal control problem for a constrained switched nonlinear dynamical system with a running and final cost which provably converges to a local optimum.

Speaker Biography: Ram Vasudevan is a Ph.D. Candidate in Electrical Engineering and Computer Sciences at the University of California Berkeley. His research interests include hybrid systems, biologically inspired robotics, and computer vision. He is a Regent’s and Chancellor’s Scholar and a co-recipient of the Innovations in Networking Award presented by the Corporation for Education Network Initiatives in California. He received his B.S in Electrical Engineering and Computer Sciences and M.S. both from the University of California Berkeley in 2006 and 2009, respectively.