How does sklearn.tree.DecisionTreeRegressor work?

Question

I have successfully trained the model on a dataset, but I have some questions because the documentation here is very difficult to read:

Does the splitter use a single scalar among the inputs, or select multiple features at the same time for the split in a single node? I plotted the trees using sklearn.tree.plot_tree and found each node looks like:

X[25] < 19.282
mse = 6.304
samples = 201445
value = 21.204

So it seems each node used only one feature (X[25] in this case) from my 47 dimensional X for splitting.

However, the model initializer has a parameter "max_features" with the explanation "the number of features to consider when looking for the best split." This seems to indicate that multiple features can be considered.

Whichever is the case?

Are there any hyperparameters that require tuning for this model besides the "criterion" which is Mean Squared Error now? It seems to me that, for regression, setting all the values to the strongest would be the best. All tunings deviating from this are only aimed at saving computing cost rather than improving performance.

However, I'm still wondering, if "max_depth" is set to an integer or "min_sample_split" is set to an integer larger than 2, then each of the leaves may consequently contain multiple samples -- will those samples be assigned the same regression value in the prediction, just because they are at the same leaf?

score 0 · Answer 1 · answered May 27 '22 at 07:18

The max-features parameter dictates how many features would be randomly selected in order to find a split. This is done to avoid over-fitting. For more details look at the following answer: https://datascience.stackexchange.com/questions/41417/how-max-features-parameter-works-in-decisiontreeclassifier
Not sure what strongest implies, many of the parameters are there to avoid over-fitting of the tree. Think about the case where for each leaf there is a single leaf. This implies that each observation is predicted its true value, making your loss 0, and indicating that the tree was over-fitted. Read more about bias-variance decomposition here.

For your last question, yes, observations in the same leaf receive the same value (their average), unless a more exotic tree variant such as M5 (where a linear function is fitted at each leaf).

How does sklearn.tree.DecisionTreeRegressor work?

1 Answers1