2

When fitting a tree regressor model, I would like to calculate the AIC and BIC metrics. However I need the maximum of the likelihood function to do this.

Is there a closed form solution or some other way to calculate the likelihood function from a tree regressor? I haven't been able to find any information online, other than a closed form solution in an OLS framework.

PyRsquared
  • 1,304

2 Answers2

3

To compute the BIC or AIC for a model, the observed dataset has to have an associated conditional distribution. For instance,

  1. In a linear regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\mathbb{R}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as

$$ t_n\vert {\bf x}_n\sim\mathcal{N}({\bf w}^T{{\bf x}_n}, \sigma^2) $$

  1. In a logistic regression, a dataset $\mathcal{D} = \{(t_n, {\bf x}_n) \vert t_n\in\{0,1\}, {\bf x}_n\in\mathbb{R}^M\}$ is assumed to be conditionally distributed as

$$ t_n\vert {\bf x}_n\sim\text{Blli}(\sigma({\bf w}^T{{\bf x}_n})) $$

  1. In an ARCH(1) model, a dataset $\mathcal{D} = \{t_n \vert t_n\in\mathbb{R}\}$ is assumed to be conditionally distributed as

$$ t_n\vert t_{n-1}\sim\mathcal{N}(0, \sigma(t_{n-1})) $$

And so on...

A classical decision tree, however, does not assume a conditional distribution for the data. There is no associated likelihood function, hence BIC cannot be computed.

If you wanted to compute the BIC, you'd need to assign to your model some sort of likelihood function.

  • So just append a distributional assumption and you have it. The real trick is figuring out the number of parameters in the tree model. – BigBendRegion Aug 17 '20 at 12:37
1

A regression tree is still a linear model (if you define the correct interaction terms). So in principle it is possible to calculate AIC and BIC with the OLS formula.

Sebastian
  • 3,064
  • 14
  • 29
  • 1
    Well, a regression tree is not a linear model. You use it to regress data that does not have a linear relationship with the target as an alternative to OLS which assumes linear data. And using the formula in the link, $k$ is the number of parameters in the OLS model, which is easy to find out. But with a tree regressor it isn't obvious how many parameters there are (number of trees? avg number of nodes per tree?) – PyRsquared Aug 17 '20 at 08:01
  • 1
    A (single) regression tree is certainly a linear model (make a simple example and convince yourself that you can define the correct interaction terms). – Sebastian Aug 17 '20 at 11:04
  • The simplest tree is just 2 leaves, a stump. You split the data into two groups, possibly based on a continuous variate. The result would be estimating the response as being 1 of 2 numbers (the average of the values in the 2 leaves). If you plotted the estimated response vs the single continuous predictor, you would see a horizontal flat line, a discontinuity (jump up or down), then another horizontal flat line. This is nonlinear. Realistic trees create many such "averages", approximating any arbitrary nonlinear function of the features. Trees are nonlinear. For sure. – rbatt Jul 14 '22 at 10:36
  • I think you missed the sentence in brackets. Of course it is not linear in the original features, but it is linear for the induced feature interactions. – Sebastian Aug 10 '22 at 09:01