Machine Learning in the Bandit Setting: Algorithms, Evaluation, and Case Studies

Much of machine-learning research is about discovering patterns—building intelligent agents that learn to predict future accurately from historical data. While this paradigm has been extremely successful in numerous applications, complex real-world problems such as content recommendation on the Internet often require the agents to learn to act optimally through autonomous interaction with the world they live in, a problem known as reinforcement learning.

Using a news recommendation module on Yahoo!’s front page as a running example, the majority of the talk focuses on the special case of contextual bandits that have gained substantial interests recently due to their broad applications. We will highlight a fundamental challenge known as the exploration/exploitation tradeoff, present a few newly developed algorithms with strong theoretical guarantees, and demonstrate their empirical effectiveness for personalizing content recommendation at Yahoo!. At the end of the talk, we will also summarize (briefly) our earlier work on provably data-efficient algorithms for more general reinforcement-learning problems modeled as Markov decision processes.

Speaker Biography

Lihong Li is a Research Scientist in the Machine Learning group at Yahoo! Research. He obtained a PhD degree in Computer Science from Rutgers University, advised by Michael Littman. Before that, he obtained a MSc degree from the University of Alberta, advised by Vadim Bulitko and Russell Greiner, and BE from the Tsinghua University. In the summers of 2006-2008, he enjoyed interning at Google, Yahoo! Research, and AT&T Shannon Labs, respectively. His main research interests are in machine learning with interaction, including reinforcement learning, multi-armed bandits, online learning, active learning, and their numerous applications on the Internet. He is the winner of an ICML'08 Best Student Paper Award, a WSDM'11 Best Paper Award, and an AISTATS'11 Notable Paper Award.