I have successfully trained the model on a dataset, but I have some questions because the documentation here is very difficult to read:
- Does the splitter use a single scalar among the inputs, or select multiple features at the same time for the split in a single node? I plotted the trees using sklearn.tree.plot_tree and found each node looks like:
X[25] < 19.282
mse = 6.304
samples = 201445
value = 21.204
So it seems each node used only one feature (X[25] in this case) from my 47 dimensional X for splitting.
However, the model initializer has a parameter "max_features" with the explanation "the number of features to consider when looking for the best split." This seems to indicate that multiple features can be considered.
Whichever is the case?
- Are there any hyperparameters that require tuning for this model besides the "criterion" which is Mean Squared Error now? It seems to me that, for regression, setting all the values to the strongest would be the best. All tunings deviating from this are only aimed at saving computing cost rather than improving performance.
However, I'm still wondering, if "max_depth" is set to an integer or "min_sample_split" is set to an integer larger than 2, then each of the leaves may consequently contain multiple samples -- will those samples be assigned the same regression value in the prediction, just because they are at the same leaf?