4

Are there any caveats when logistic regression is used on a sample with average probability of success close to one (1.4M dataset, mean prob. of success = 0.975)?

1 Answers1

4

As you seem to come from a social-science background, King & Zeng (2001), "Logistic Regression in Rare Events Data", Political Analysis, 9, pp 137–163 might be a good starter - the term here is "rare event data".The authors claim that "popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events", and the paper had quite some impact.

Julian Schuessler
  • 2,375
  • 14
  • 17
  • 3
    That nice paper shows how to estimate the bias in $\hat{\beta}$ in the rare event situation, but does not demonstrate that the bias-corrected estimator is closer to the true $\beta$, i.e., that the variance of the bias correction is small enough that it doesn't matter. – Frank Harrell Mar 01 '14 at 13:55
  • 1
    @julian: I added the full reference - it's often a good idea to do so because links "rot" – Scortchi - Reinstate Monica Mar 01 '14 at 16:35