Machine Learning in the Bandit Setting: Algorithms, Evaluation, and Case
Much of machine-learning research is about discovering
patterns---building intelligent agents that learn to predict future
accurately from historical data. While this paradigm has been extremely
successful in numerous applications, complex real-world problems such as
content recommendation on the Internet often require the agents to learn
to act optimally through autonomous interaction with the world they live
in, a problem known as reinforcement learning.
Using a news recommendation module on Yahoo!'s front page as a running
example, the majority of the talk focuses on the special case of
contextual bandits that have gained substantial interests recently due
to their broad applications. We will highlight a fundamental challenge
known as the exploration/exploitation tradeoff, present a few newly
developed algorithms with strong theoretical guarantees, and demonstrate
their empirical effectiveness for personalizing content recommendation
at Yahoo!. At the end of the talk, we will also summarize (briefly) our
earlier work on provably data-efficient algorithms for more general
reinforcement-learning problems modeled as Markov decision processes.
Lihong Li is a Research Scientist in the Machine Learning group at Yahoo! Research. He obtained a PhD degree in Computer Science from Rutgers University, advised by Michael Littman. Before that, he obtained a MSc degree from the University of Alberta, advised by Vadim Bulitko and Russell Greiner, and BE from the Tsinghua University. In the summers of 2006-2008, he enjoyed interning at Google, Yahoo! Research, and AT&T Shannon Labs, respectively. His main research interests are in machine learning with interaction, including reinforcement learning, multi-armed bandits, online learning, active learning, and their numerous applications on the Internet. He is the winner of an ICML'08 Best Student Paper Award, a WSDM'11 Best Paper Award, and an AISTATS'11 Notable Paper Award.