6

I am running a generalised linear model in R. I have a single response variable and a maximum of 4 possible explanatory variables. I am adding each explanatory variable to the model sequentially, based on whether the coefficient is statistically significant.

If the coefficient for an explanatory variable is statistically significant at 0.05, the explanatory variable remains in the model. If the coefficient for an explanatory variable is NOT statistically significant at 0.05, the explanatory variable is removed from the model.

I am wondering if instead of using 0.05, I should be using a Bonferroni corrected P value? Should I use a threshold of 0.05/4 = 0.0125?

luciano
  • 14,269

1 Answers1

7

When you test multiple hypotheses, the chance of a type I error increases. The Bonferroni correction is a conservative method to address this by adjusting the significance level. For 4 tests, you’d use 0.0125 (0.05/4) as your threshold instead of 0.05.

Regarding (forward) stepwise selection, this is a very bad idea can lead to models that overfit the data, biased estimates, and unstable variable selection. It doesn't account for the possibility that the best model might include variables that are not individually significant. See these threads on our site for further details:

Algorithms for automatic model selection

Understanding why stepwise selecton based on p-values is bad

Main Drawbacks of stepwise regression

(Why) Are stepwise regression coefficients biased?

this from from Andrew Gelmans's site:

Why we hate stepwise regression

and this article by Peter Flom:

Stopping stepwise: Why stepwise selection is bad and what you should use instead

Robert Long
  • 60,630