My model has decently high AUC=90%, but is biased, underestimating the probability $y=1$. This is systematic across some of the input features as well. How can I nudge the bias term, or otherwise address this issue? I am surprised that the model ends up being biased, despite it having an intercept (bias term). My dataset is very unbalanced, only 3% positives (1s) vs 97% 0s. But in y_hat, the number of 1s is closer to 2.5%.
Asked
Active
Viewed 68 times
y_hat, and divide by the sum ofy_real, the ratio does turn out to be roughly 2.5 / 3.Do you suggest to address this by lowering the threshold at application time?
– user623949 May 04 '22 at 01:08