Why Some Algorithms Produce Calibrated Probabilities

Question

I have seen that some algorithms especially the ones that are linear produce well-calibrated probabilities.

For example, Logistic Regression or the Multi-Layer Perceptron produce highly calibrated probabilities whereas non-linear algorithms like SVM, Random Forest or KNN don't.

Gradient Boosting Trees, which is non-linear, on the other hand, produces a very well calibrated class probabilities.

Can someone please explain why is this the case with different algorithms? Many thanks in advance!

Sure! the sum of the predicted probabilities for the positive class is more or less equals the sum of the positive class. For example, a 0.4 predicted probability should result in a ~0.4 win rate. Let me know if this is clear? — Krishnang K Dalal, Mar 03 '20 at 21:55
I am not sure that GBMs give well-calibrated probabilities... — usεr11852, Apr 02 '22 at 13:13

score 12 · Accepted Answer · answered Mar 04 '20 at 00:23

Calibration reflects how well the predicted class probabilities match the 'true' probabilities according to the underlying distribution of the data. As a consequence, the properties of a learning algorithm itself don't universally determine how well- or poorly calibrated the results will be. Rather, it depends on how well suited the learning algorithm is to the particular problem.

To illustrate the point, here's a toy example using a Gaussian naive Bayes classifier. Let's generate two datasets, where points in each class are sampled from a 2D Gaussian distribution. In the first dataset, points are generated using full covariance matrices. This violates the naive Bayes assumptions because input features are not conditionally independent, given the class. In the second dataset, points are generated using diagonal covariance matrices. In this case, the naive Bayes hypothesis space actually contains the true model. Here are the data and calibration results:

Calibration is poor on the first dataset, but fairly good on the second dataset. This shows that being well calibrated or poorly calibrated depends on the problem, and is not a universal property. It also supports the idea that well-calibratedness has to do with how closely the model approximates the underlying distribution.

That said, this doesn't contradict the notion that particular learning algorithms might tend to produce well calibrated or poorly calibrated results on real-world problems, which may share certain common features. For example, the conditional independence assumption doesn't hold in many problems we care about, so naive Bayes would be expected to give poorly calibrated results across these problems.

For more on probability calibration and a comparison of different classifiers on benchmark datasets, see:

Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).

Thanks a lot for this very intuitive answer. You explained it quite clearly! — Krishnang K Dalal, Mar 04 '20 at 15:45

Why Some Algorithms Produce Calibrated Probabilities

1 Answers1

Linked