Speaker: Karthik Abinav Sankararaman

Affiliation: University of Maryland

Title: Adversarial Bandits with Knapsacks

Abstract: In this talk we will discuss the multi-armed bandits problem with resource constraints under the adversarial setting. In this problem, we have an interactive and repeated game between the algorithm and an adversary. Given T time-steps, d resources, m actions and budgets B1, B2, .. Bd, the algorithm chooses one of the m actions at each time-step. An adversary then reveals a reward and consumption for each of the d resources corresponding to this action. The time-step at which the algorithm runs out of the d resources (i.e., the total consumption for resource j > Bj), the game stops and the total reward is the sum of rewards obtained until the stopping time. The goal is to maximize the competitive ratio; the ratio of the total reward of the algorithm to the expected reward of a fixed distribution that knows all the rewards and consumption ahead of time. We give an algorithm for this problem whose competitive ratio is tight (matches the lower-bound). Moreover the algorithmic tools extends in an (almost) black-box fashion to also give an algorithm for the stochastic setting thus giving a “best-of-both-worlds” algorithm where the algorithm need not know a-priori if the input is adversarial or i.i.d. Finally we conclude with applications and special cases including the Dynamic Pricing problem.

This talk is based on a recent working paper with Nicole Immorlica, Rob Schapire and Alex Slivkins.