0

I have a dataset with lots Y=0 and few Y=1. I have to run logistic regression, so I'm using a retrospective sample in order to get a more balanced sample. Could someone give me some references that explain which are the problems arising when I use logistic regression in an unbalanced sample? I kwow that the main problems are instability of estimated coefficients and poor predictive power of the model, but I need some references.

1 Answers1

2

Take a look at Logistic Regression in Rare Events Data in Political Analysis 9 (2001): 137-63 by Gary King and Langche Zeng.

There really isn't a problem using logistic regression modelling in the case you described. The issues is that your estimates will have small-sample bias. You can use exact logistic regression if your sample isn't too big or you can use the method described in the paper above which is based off of a penalized-likelihood approach.

StatsStudent
  • 11,444