Random Forest Classifier Terminal Nodes as a Probability Distribution

Question

I've been diving deep into random forests and had a question about terminal nodes.

I know in general when you reach the terminal node, or leaf, of a random forest, the assigned value for that leaf is the mode of the responses that end up there. For example, if you have 5 training examples end up in a terminal (based on the hyperparameters of the model), such that their labels are (A,A,A,B,B), you would set the prediction for that leaf as A.

Is there a reason you wouldn't just have it be a probability distribution? In the above example, you could return A 60% of the time, and B 40% of the time?

My intuition says this would increase the variance of the model, but I'm just looking for some mathematical rigor behind this intuition (or, if this intuition is wrong, an explanation of what I'm not understanding.)

Because that's random and non-reproducible. The model does return the probabilities (if you use select the right argument) while the decision is done based on these probabilities and happens outside the model. — user2974951, Dec 13 '22 at 07:08

score 2 · Accepted Answer · answered Dec 13 '22 at 08:37

2

Having RFs output probabilistic predictions is unfortunately not very common, but many implementations support exactly this, e.g., randomForest::predict.randomForest(..., type="prob") in R. I would indeed say that this is usually much more useful than the simple majority voting that is used in turning the terminal node population into a hard classification, see links in this answer of mine.

This does not increase the variance of the model, because the question itself comflates two topics, probabilistic vs. hard classification. It's a question of using probabilistic instead of hard classification.

answered Dec 13 '22 at 08:37

Stephan Kolassa

123,354

Thank you, this makes sense. Although, on the topic of variance, you could imagine these "probabilistic" leaves as several leaves, essentially turning the DAG that is a decision-tree into a markov chain. As you add additional states, doesn't this bump the variance up? A more complicated tree does, right? – Al.Sal Dec 13 '22 at 23:13
Yes, but now you are really even more comparing apples and oranges... – Stephan Kolassa Dec 14 '22 at 07:06

Random Forest Classifier Terminal Nodes as a Probability Distribution

1 Answers1