classification ML model: probability of positive label knowing the model score

Question

Question at the intersection of ML and statistics.

I built a binary classification ML model, that for each input observation x outputs the probability p(x) in (0,1) that x belongs to the positive class. I am satisfied with the evaluation metrics. I have not chosen a threshold yet, although I do not know if that is relevant for the following questions.

I have an observation x. Without knowing anything else about it, my best guess for x belonging to the positive class is the frequency of the positive labels in the known population (let's say positive/total in the training or testing data - that frequency is approximately the same, so it does not matter which data set we consider). This is my Bayesian prior.

Suppose my new observation x is scored by my model as p(x) = p. How do I adjust my Bayesian prior with this new information? In other words, what is P(x = 1 | p(x) = p)?

My thoughts are as follows. For a model threshold of t, I can calculate the precision TP / (TP + FP) at threshold t. Then I have P(x=1 | p(x)>=t) = P(x=1 | model at threshold t classifies x=1) = TP / (TP + FP), that is, exactly the precision above.

But how do I turn the >= in the conditional probability into an = sign?

Are you assuming the probabilities from your ML model are calibrated? With many models like xgboost or NNs they'd typically not be. — Björn, Aug 22 '21 at 22:25
I was unfamiliar with the concept of calibrated probabilities, thx for bringing that up. My understanding (https://scikit-learn.org/stable/modules/calibration.html) is that if my classifier was well-calibrated, then P(x=1 | p(x)=p) would be precisely p(x). If not well-calibrated (as is the case), then I have to fit a regressor on top of it, rather than approach the question with probabilistic/statistical techniques. Then, I can use the fitted regressor to predict the true probability of new samples. Correct? — Niccolo', Aug 22 '21 at 23:15
Yes, if it's calibrated, then the predicted portability would say least be what you want for the training data, but you could have differences in prevalence or the X distribution between the training data & the population you care about in practice. Fitting a regression/Platt scaling could be one try for fixing mis-calibration. — Björn, Aug 23 '21 at 05:39

Dave · Answer 1 · 2023-04-01T15:24:50.973

This is the point of assessing probability calibration. If your predicted probabilities are calibrated, this means that an event predicted to happen with a probability of $p$ really happens with probability $p$. If you lack calibration, then the predicted probabilities are, in some sense, not telling the truth, and getting the truth out of a liar might be difficult. It is possible to apply techniques like isotonic regression to transform your predictions to have good calibration, but this requires additional modeling that might not be successful (e.g., overfitting can happen in this step, too).

If you lack calibration, it is not a given that you can get it.

Your comments mention that you want to discuss this just in terms of the probability instead of a regression, but $P(X = 1\vert p(X) = p)$ is a conditional expected value for a binary event, which is what regression does.

classification ML model: probability of positive label knowing the model score

1 Answers1

Linked