probability calibration and Brier score

Question

Assume that I have a binary classification problem. The outcome from classification I am mostly interested in is the well-calibrated probabilities.

The first way to check this is the calibration plot (or reliability curve).

The question: is it fair enough to judge about calibration based on Brier score?

Assume that we have "enough" data. Would the classifier with smaller Brier score provide a rather better reliability curve?

My concern comes from the fact that the probability from a classifier are the conditional probabilities. Therefore, I do not see the intuition of applying the Brier score to conditional probabilities.

Eoin · Accepted Answer · 2020-09-07T14:11:33.133

The short answer is that it only makes sense to calculate the Brier score for the conditional probabilities, $\hat y = P(y=1|X)$, where $y$ is the outcome, $\hat y$ is your prediction, and $X$ are your predictors. In other words, $\hat y$ is the probability that $y=1$, conditional on this particular value of the predictors, $X$.

The Brier score in this case is just

$$ \frac{1}{N}\sum_i^N (\hat y_i - y_i)^2 $$

What other kinds of probability could there be? The only other option here is the marginal probability, $P(y=1)$. We can estimate this by simply counting the proportion of times $y=1$ in the data. Clearly, it doesn't make sense to use this value when calculating the Brier score!

Would the classifier with smaller Brier score provide a rather better reliability curve?

Yes. If your classifier predicts $\hat y = 1$ in all cases where $y = 1$ and $\hat y = 0$ where $y = 0$, it has a Brier score of $0$. If it does the opposite, it has a score of $\pm1^2 = 1$. In most cases, such perfect predictions won't be possible, but a good classifier can still be well calibrated, for instance by predicting $\hat y = 0.5$ in cases where $y = 1$ half the time and $y=0$ the rest. A classifier that does this will have the lowest possible Brier score on your data.

Why doesn’t it make sense to calculate Brier score for a model that always guesses the prior probability? — Dave, Sep 07 '20 at 13:04
I mean, you could do it, and it makes sense mathematically, but it doesn't tell you anything you want to know. My main point is that @ABK was possibly a bit confused about the difference between conditional and marginal probability. — Eoin, Sep 07 '20 at 13:48
To elaborate, if you decided to plug marginal $P(y)$ into the Brier score calculations instead of $P(y|X)$, you would get a score close to 0 if the data is very imbalanced (most zeros, or mostly ones), and a score of $\pm 0.5^2 = 0.25$ if it's 50% zeros and 50% ones. — Eoin, Sep 07 '20 at 15:02

probability calibration and Brier score

1 Answers1

Linked