Why does prediction calibration on a resampling mode does not meet the expectation?

Question

I am doing a small project to predict the write-off probability of our defaulted customers.

In the original population, the write-off rate is about 0.515. Now, for some reason I have to undersample the population and populate a new data set in which the write-off probability is done to 0.15. Since the undersampling change the original event probability of the population, I need to calibrate the model predictions in order to represent the true probability that a write-off can happen.

Based on the new data, I built a logistic regression model using glm in R. and then I referred to approach discussed in the post "converting predicted probabilities after downsampling to actual probabilities in classification" to do calibrate the outcome. I checked the probability generated by the LR model and its mean is 0.15.

From my understanding, I am supposed to see the average of the new probability very close to 0.515. However, the average/mean of the calibrated probability is 0.64 which is quite different from the original event probability.

My questions are:

Is my understanding correct? i.e., the mean of the new Prob should be 0.515.
Based on the calculation in the reference: $$ p = \frac{1}{1+\frac{\left(\frac{1}{\alpha}-1\right)}{\left(\frac{1}{\alpha'}-1\right)} \cdot \left(\frac{1}{p_s}-1\right)}.$$ , where $\alpha$ denote the "original rate" , $\alpha'$ denote the (re/over/under)sampled rate, $p_s$ as the model's output "probability" and $p$ the calibrated probability.

Is it provable that mean of $p$ should be equal to or approximately equal to $\alpha$?

Thanks for your help.

Why do any kind of balancing at all? // I do not totally follow what you’re asking, but if you’re wondering if, if the true conditional probability of an event of $0.1$, you want to predict a probability close to $0.1$, yes. — Dave, Sep 13 '22 at 14:20
Yes, that is what is expected after the calibration. Do you see the uncalibrated predictions averaging around 0.5? If so, then you've probably applied the calibration incorrectly; if not, then it's the model's fault. — Ben Reiniger, Sep 13 '22 at 14:37
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Sep 13 '22 at 16:53
Thanks above. I have re-phrased my question to make it more clear. — simohayha, Sep 13 '22 at 23:44

Ben Reiniger · Answer 1 · 2022-09-16T16:28:22.700

I have seen something similar to this actually, but I hadn't fully resolved it. I still haven't exactly, but I'll provide my current thoughts.

The problem is the logit link's nonlinearity. A logistic regression will have well-calibrated probabilities because it minimizes log-loss; hence you see matching 0.15. But that does not imply that the log-odds are similarly calibrated: the logit function cannot be interchanged with the expectation

$$\mathbb{E}(\sigma(\text{model_logodds})) \neq \sigma(\mathbb{E}(\text{model_logodds})).$$

If the model logodds are all negative or all positive , then the logit is convex/concave and Jensen's inequality would apply to give you a direction for that inequality. Since your model is on an imbalanced set, the predicted logodds are probably mostly negative, and so I think the larger average probability is not unexpected.

(If the shift were to probabilities rather than logodds (either additive or multiplicative), then you'd get $\mathbb{E}(\text{model_probs} \dotplus \text{shift}) \approx 0.15 \dotplus \text{shift}$ and so you could define the probability shift to ensure calibration in the large. That won't be calibrated "in the small" though, and worse you may well get shifted-predicted-probabilities that aren't in $[0,1]$.)

All this leaves me with an apparent contradiction though: if you build a logistic regression directly on the balanced dataset,

it should be well-calibrated
its coefficients aside from the intercept should be unbiased estimators for the true ones
its intercept should be shifted by the quantity in question

(2 and 3 from e.g. https://stats.stackexchange.com/q/67903/232706, 1 from https://stats.stackexchange.com/q/208867/232706) but 2 and 3 together imply it's your shifted model, which contradicts your demonstration and 1. I'm going to leave this as an answer for now, but will try to find out more and either revise here or ask as a new question (or bounty your question).

Thanks Ben for your valuable comments. I need more time to digest it. — simohayha, Sep 15 '22 at 23:18

Why does prediction calibration on a resampling mode does not meet the expectation?

1 Answers1