How should one make a sequence of decisions in an uncertain environment? If one had complete knowledge of the future, it would be an offline problem of pure optimization. This talk relates online repeated decision-making problems to their offline optimization counterparts.
We begin with a general continuous decision making problem, which can be viewed as online convex optimization with extremely little feedback. As an example, consider a company that wishes to repeatedly choose its monthly advertising portfolio so as to maximize its average monthly profit. Each month, after deciding how much to advertise on a number of channels, it finds out only its total profit, and not how much it would have made had it advertised differently. This profit is a function of its advertising expenditures as well as many other decisions and unrelated economic factors. More precisely, a decision-maker must choose a sequence of points x_1, x_2, …, from some fixed convex feasible set. After choosing the t’th point, x_t, only the reward f_t(x_t) is revealed and no other information about the reward function f_t. Moreover, the functions f_1, f_2, …, may be arbitrary (bounded convex) functions and may be completely unrelated. We give a very simple randomized algorithm that achieves average performance approaching that of the best single decision in hindsight (at a rate inversely polynomial in the number of decisions). One key idea is to approximate the gradient of a function by evaluating it at a single point. Applications include advertising, production, and investment portfolios.
We next consider discrete repeated decision-making problems and their offline combinatorial optimization counterparts. We show how, in the face of complete uncertainty, almost any offline combinatorial optimization algorithm can be modified and applied online to the repeated decision-making setting, with nearly the same efficiency and performance guarantees. We discuss applications to adaptive routing, online auctions, and classification learning.