3

Lets say I trained a random forest on classes 0,1,2 as seen below.

Now I predict the classes probabilities with random forest for new points A and B.

Would I get...

For point A a probabilities classes of: 50% - class 0, 25% class 1, 25% class 2?

For point B a probabilities classes of: 90% - class 0, 5% class 1, 5% class 2?

In other words, does the class probability meaning is some kind of normalized distance between classes?

Now, let's say class 0 had only one point. Would this change the probabilities above?

Would appreciate if someone can put some light on this.

Thank!

enter image description here

2 Answers2

3

There are many interpretations of probability. Skimming through them would be a good starting point to understand the concept better.

The decision tree makes probabilistic predictions by calculating the fraction of the samples for each class at the particular node. For example, if your decision tree has three variables, and made three binary splits: is a female AND is a car owner AND is not liking pizza, then the probability would be the fraction of samples from each class for this subset of data. Random forest makes predictions by averaging the predictions of the individual trees. You can interpret the probability directly, as telling you how often samples like this would belong to a particular class.

While this is simple for models like the random forest, or $k$NN, for other models that don't calculate the probabilities as fractions of observations from a particular class it may be harder. In such cases, it might be easier to think of the probabilities as a measurement of how certain the model is that the observation belongs to the class, this is the Bayesian "degree of belief" interpretation. It may be easier to think of it like this.

Finally, keep in mind that models like this are not well-callibrated, as the probabilities do not need to reflect the true probabilities. Without taking additional steps to calibrate the probabilities, they may be less useful. This means, that you should think of the probabilities as of some kind of score, where the higher, should convince you more to make the particular classification decision. Thinking of it just as a score would be less prone to misinterpretation for many machine learning models.

Tim
  • 138,066
0

See also this discussion about adjusting threshold and RF probability meaning citing a paper by Olson et al : Making Sense of Random Forest Probabilities: a Kernel Perspective

in which they "undertake an empirical investigation to determine the extent to which random forest parameter tuning influences its probability estimate"

Cazz
  • 66