Accuracy of classification vs. accuracy of class probabilities

Question

I have a dataset that contains a binary response variable: equal to 1 if the person responded to the survey and 0 otherwise, as well as a host of auxiliary variables X. What I want to do is use this data to estimate response probabilities. To do this I perform cross validation on my data and test various algorithms. Now my issue is that my data is labeled in that I know whether the record responded or not, however it is unsupervised in that I don't know what I truly care about which is the response probability. The question then becomes, can I use the classification accuracy derived from cross validation to infer that my class probabilities are good?

For example, I compare two algorithms A and B. A has classification accuracy 90% and B has classification accuracy 80% can I infer that the class probabilities that A gives are then better than what B gives?

If not what other measure can I use to infer quality?

Not sure what you mean about the unsupervised bit... you have the true output of response or no response, which is what you're trying to predict, so it's completely a supervised problem. Perhaps you are looking for calibration of your probabilistic output. For a well-calibrated predictor, you'd see (for example) that among people for whom you predict a 70-80% probability of response, ~75% of them actually respond. A poorly calibrated predictor might predict 100% or 0% probability for everyone, and have good overall accuracy, but the probabilities themselves don't reflect reality. — Nuclear Hoagie, Apr 15 '19 at 15:03
Right, I meant that it is "unsupervised" in that what I really care about is class probabilities (probability of response) but I only know true values of whether or not the responded (I don't know the true probability of response). — astel, Apr 15 '19 at 15:25
Well, obviously. You can't 'observe' the probability on an outcome, you can only observe outcomes and infer the probability of the different outcomes using statistical modeling. — Scholar, Apr 15 '19 at 17:04
Again, I know this, but my questions is what is the best way to infer whether or not my estimate of the probability of the outcome is of good quality? Is algorithm A having better classification performance than algorithm B enough to say that the estimates of the probabilities with A are also better than with B? — astel, Apr 15 '19 at 19:18

score 0 · Answer 1 · answered Apr 09 '23 at 19:34

Classification accuracy tells you how accurate your model is when you use some threshold to bin the probability predictions. In fact, the predicted probabilities can be quite poor despite good classification accuracy at a particular threshold.

Consequently, classification accuracy does not seem to be particularly interesting or useful to your situation.

What you can do is estimate true response probabilities and how reflective of them your model is. This is called "calibration". The Python package sklearn has a calibration method, as does the R package rms that I demonstrate below.

library(rms)
set.seed(2023)
N <- 1000
x1 <- runif(N)
x2 <- runif(N)
z <- 3*x1 - 3*x2
p <- 1/(1 + exp(-z))
y <- rbinom(N, 1, p)
L <- rms::lrm(y ~ x1 + x2, x = T, y = T)
cal <- rms::calibrate(L, B = 1000)
plot(cal)

Ideally, the calibration curve will equal the line $y=x$, since we want the probability of event occurrence to equal the predicted probability. Since the calibration curve is close to the $y=x$ line, the calibration seems to be pretty good. When you do something like this for your model, you may find that it lacks calibration. Examples of models that lack calibration are given in the sklearn documentation.

Accuracy of classification vs. accuracy of class probabilities

1 Answers1