Leveraging techniques such as boosting and bagging, combine the classification rules produced by many runs of a simpler learning algorithm. The rules produced are often far superior to those produced by the simpler algorithm alone.
These techniques are appealing for several reasons: they are simple to implement, computationally efficient and empirically successful. Perhaps the most successful and best known such algorithm is Freund and Schapire’s AdaBoost.
The incredible practical success of AdaBoost motivated our efforts to understand and generalize it.
In this talk I will introduce the AdaBoost algorithm and discuss how it may be viewed as gradient descent on a potential function. I will discuss how this viewpoint leads to generalizations of the approach, provide guarantees on the performance of such algorithms, and discuss how we adapted the gradient descent framework to the regression setting.