Probabilistic Modeling for Large-scale Data Exploration

Chong Wang, CMU
Host: Mark Dredze

We live in the era of “Big Data,” where we are surrounded by a dauntingly vast amount of information. How can we help people quickly navigate the data and acquire useful knowledge from it? Probabilistic models provide a general framework for analyzing, predicting and understanding the underlying patterns in the large-scale and complex data.

Using a new recommender system as an example, I will show how we can develop principled approaches to advance two important directions in probabilistic modeling—exploratory analysis and scalable inference. First, I will describe a new model for document recommendation. This model not only gives better recommendation performance, but also provides new exploratory tools that help users navigate the data. For example, a user can adjust her preferences and the system can adaptively change the recommendations. Second, building a recommender system like this requires learning the probabilistic model from large-scale empirical data. I will describe a scalable approach for learning a wide class of probabilistic models, a class that includes our recommendation model, from massive data.

Speaker Biography

Chong Wang is a project scientist in the Machine Learning Department, Carnegie Mellon University. He received his PhD from Princeton University in 2012, advised by David Blei. His research lies in probabilistic graphical models and their applications to real-world problems. He has won several awards, including a best student paper award at KDD 2011, a notable paper award at AISTATS 2011 and a best student paper award honorable mention at NIPS 2009. He received the Google PhD Fellowship for machine learning and the Siebel Scholar Fellowship. His thesis was nominated for ACM Doctoral Dissertation Award by Princeton University in 2012.