4

I'm currently reading Intro to Statistical learning. On the chapter about Tree models I'm not sure if my intuition about the formula for boosting is correct

enter image description here

From the looks of it, it seems like shallow trees are built on top of one another with a shrinkage parameter. Would the end result of this summation be one large tree compromised of the sequentially built trees? Or am I completely off on this?

bronson
  • 61

2 Answers2

7

As the other answer mentions, the direct result is a collection of trees.

Depending on the operation used to combine trees, however, these can be combined into a single tree spanning the whole ensemble. It is very easy to see how. See this question for an example: Is the sum of two decision trees equivalent to a single decision tree?

Singe GBDTs are (almost always) simply stacked (i.e., added) sequentially, this also applies to them. This means that the ensemble, in the end, is mathematically equivalent to a single decision tree. It does not mean it is necessarily possible to learn the same decision tree by GBDTs (since the split criteria are different) though, but the final structure is itself a single polytree graph.

Firebug
  • 19,076
  • 6
  • 77
  • 139
3

You're right that boosting is a summation of many "shrunken" trees. I guess for the intuition, though, boosting doesn't result in one final "tree", per se, at least in the sense of a single decision tree that you're able to go down splits and eventually end up at a node with a predicted value.

With boosting, you take a set of predictors and you input them into all $B$ of your trees, and each tree outputs a contribution, and all these $B$ contributions are then summed into a final boosted model output.

If I were to visualize a boosted model, I wouldn't be able to draw just a single tree. I'd have to draw something more like a series of $B$ decision trees in a row, write the contribution of each decision tree under it, and add $+$ signs in between each contribution to end up with the total / model output.

I'd say it's something more like a bunch of apple trees working together that each contribute a portion of a single apple that gets output.

  • 1
    You can merge trees into a single tree. It doesn't mean it's a more useful representation, nor is it central to GBDTs, but it is surely possible. – Firebug Nov 14 '23 at 13:33