I'm trying to understand Gradient Tree Boosting, by following Prof. Friedman's original paper: Greedy Function Approximation: A Gradient Boosting Machine. Basically, at each iteration, a regression tree will be used as a base learner, where the weights of the regions in the tree will be learned. I notice that the regression trees used at different iterations actually have same regions (though the weights may be differently learned at each iteration), since the regions are split based on the data features only. May I know if my understanding is correct? If so, I feel that only one tree is not good enough to learn a robust Gradient Boosting model.
Asked
Active
Viewed 397 times
1 Answers
4
The regions are not split based only on the data features.
In each iteration of gradient boosting, you fit a regression tree to the residuals of the loss function at the current prediction, $\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)},$ where $F$ is the function you have learned so far. Since these residuals change at every iteration (because $F$ is different at every iteration), each base learner will learn to split up the data differently.
Ben Kuhn
- 5,728
- 1
- 20
- 28
At each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used, instead of the full sample, to fit the base learner and compute the model update for the current iteration.(from Friedman's paper) is compatible with the fact that we fit each tree to the residuals? I have asked a question about it (bounty ends in 5 days). – Antoine Oct 07 '15 at 08:49