From the activities of the US Patent Office or the National Institutes of Health to communications between scientists or political legislators, complex social processes—groups of people interacting with each other in order to achieve specific and sometimes contradictory goals—underlie almost all human endeavor. In order draw thorough, data-driven conclusions about complex social processes, researchers and decision-makers need new quantitative tools for exploring, explaining, and making predictions using massive collections of interaction data. In this talk, I will discuss the development of novel machine learning methods for modeling interaction data, focusing on the interplay between theory and practice. I will concentrate on a class of models known as statistical topic models, which automatically infer groups of semantically-related words (topics) from word co-occurrence patterns in documents. These topics can be used to detect emergent areas of innovation, identify communities, and track trends across languages. Until recently, most statistical topic models relied on two unchallenged prior beliefs. I will explain how challenging these beliefs increases the robustness of topic models to the skewed word frequency distributions common in document collections. I will also talk about a) the creation of a publicly-available search tool for National Institutes of Health (NIH) grants, intended to facilitate navigation and discovery of NIH-funded research, and b) a new statistical model of network structure and content for modeling interaction patterns in intra-governmental communication networks. Finally, I will briefly provide an overview of some of my ongoing and future research directions.
Hanna Wallach is an assistant professor in the Department of Computer Science at the University of Massachusetts Amherst. She is one of five core faculty members involved in UMass’s newly-formed computational social science research initiative. Previously, Hanna was a postdoctoral researcher, also at UMass, where she developed Bayesian latent variable models for analyzing complex data regarding communication and collaboration within scientific and technological communities. Her recent work (with Ryan Adams and Zoubin Ghahramani) on infinite belief networks won the best paper award at AISTATS 2010. Hanna has co-organized multiple workshops on Bayesian latent variable modeling and computational social science. Her tutorial on conditional random fields is widely referenced and used in machine learning courses around the world. As well as her research, Hanna works to promote and support women’s involvement in computing. In 2006, she co-founded the annual workshop for women in machine learning, in order to give female faculty, research scientists, postdoctoral researchers, and graduate students an opportunity to meet, exchange research ideas, and build mentoring and networking relationships. In her not-so-spare time, Hanna is a member of Pioneer Valley Roller Derby, where she is better known as Logistic Aggression.