How to choose a regression tree (base learner) at each iteration of Gradient Tree Boosting?

Question

I'm trying to understand Gradient Tree Boosting, by following Prof. Friedman's original paper: Greedy Function Approximation: A Gradient Boosting Machine. Basically, at each iteration, a regression tree will be used as a base learner, where the weights of the regions in the tree will be learned. I notice that the regression trees used at different iterations actually have same regions (though the weights may be differently learned at each iteration), since the regions are split based on the data features only. May I know if my understanding is correct? If so, I feel that only one tree is not good enough to learn a robust Gradient Boosting model.

score 4 · Accepted Answer · answered Mar 21 '15 at 05:55

4

The regions are not split based only on the data features.

In each iteration of gradient boosting, you fit a regression tree to the residuals of the loss function at the current prediction, $\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)},$ where $F$ is the function you have learned so far. Since these residuals change at every iteration (because $F$ is different at every iteration), each base learner will learn to split up the data differently.

answered Mar 21 '15 at 05:55

Ben Kuhn

5,728
1
20
28

I think I may get what you mean. May you help verify if I understand correctly? Basically, at the first iteration, the response of each split region are purely based on the features, right? After the first iteration, new weights (i.e. $b_{jm}$ as in Section 4.3 of the original paper) will be learned. So these new weights will be used to replace the initial response, which later affects the construction of the regression tree in the next iteration? – mintaka Mar 22 '15 at 00:42
1

Yep, that's correct. (Well, they'll be added to the previous response $F_{m-1}(x)$ to get the new response $F_m(X)$, as in equation 17, not just replacing it entirely, but it sounds like you get the idea.) – Ben Kuhn Mar 22 '15 at 01:13
1

Ah, I see. I misunderstood again a bit though (I thought $b_{jm}$ would be the response of each sample falling into the corresponding region, which will be used for the next iteration. But in fact, the response is actually obtained from the current additive model $F_m(x)$). This makes much more sense, as the response used to construct the regression tree is directly related to the up-to-date model $F_m(x)$ at the current iteration. Thanks a lot for your help! – mintaka Mar 22 '15 at 01:26
+1 both great Q&A. I have trouble understanding how At each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used, instead of the full sample, to fit the base learner and compute the model update for the current iteration. (from Friedman's paper) is compatible with the fact that we fit each tree to the residuals? I have asked a question about it (bounty ends in 5 days). – Antoine Oct 07 '15 at 08:49

How to choose a regression tree (base learner) at each iteration of Gradient Tree Boosting?

1 Answers1