1

I was researching logistic regression's critieria, and I found in many different sources that the ideal

sample size = (10 * #of explanatory variables) / (probability of least frequent observations)

Can anyone explain the reasoning behind this formula? I want to understand why this formula is used before I blindly use it to determine if my sample size is large enough for a logistic regression model to be accurate.

  • 3
    You have to state precisely what do you mean by ‘accurate’. – utobi Dec 21 '22 at 05:54
  • 1
    I don't think I have ever seen this rule of thumb characterized as "ideal:" usually it is a minimum. One interpretation is that if you have a smaller sample than this, be even more wary of overfitting and lack of power than you might otherwise be. It is usually employed to help you decide how many explanatory variables to use for model fitting in circumstances where you cannot increase the sample size. – whuber Dec 21 '22 at 14:59
  • 1
    The minimum sample size is 96 if there are no covariates as argued in my course notes at hbiostat.org/rmsc/lrm.html – Frank Harrell Dec 21 '22 at 15:57
  • @Frank Yes, that's very good to know and alerts us to be careful. But that argument is based on a worst case and an arbitrary precision target. There are realistic logistic regression applications that need far fewer data to achieve acceptable power, depending on the purpose. Well-designed dose-response studies come to mind. – whuber Dec 22 '22 at 17:17
  • You can only have acceptable power in lower sample sizes if the effect you are trying not to miss is huge. This effect size is almost always overstated. And the 96 is not a worst case. Most analysts would seek a margin of error of less than 0.1 in estimating probabilities and most analyses would involve covariates. The sample size needed for a model with covariates is greater than the sample size needed when only estimating the intercept (which is a best case analysis in this sense). – Frank Harrell Dec 23 '22 at 11:02
  • The most comprehensive approach to sample size estimation for logistic regression modeling IMHO is https://onlinelibrary.wiley.com/doi/10.1002/sim.7992 which includes the "96" type of consideration. – Frank Harrell Dec 23 '22 at 17:11

0 Answers0