Speaker: Alex Slivkins
Affiliation: Microsoft Research – New York
Title: Bandits with Resource Constraints
Abstract:
Multi-armed bandits is the predominant theoretical model for exploration-exploitation tradeoff in machine learning, with countless applications ranging from medical trials, to communication networks, to Web search and advertising, to dynamic pricing. In many of these application domains the learner may be constrained by one or more supply/budget limits, in addition to the customary limitation on the time horizon. We introduce a general model that encompasses such problems, combining aspects of stochastic integer programming with online learning. A distinctive feature (and challenge) in our model, compared to the conventional bandit problems, is that the optimal policy for a given problem instance may significantly outperform the policy that always chooses the best fixed action. Our main result is an algorithm with near-optimal regret relative to the optimal policy. Also, we extend this result to contextual bandits, and detail an application to dynamic pricing.