I have run a stepwise regression on R. However, the summary of the final model includes some factors that are not significant. Why have these factors not been removed? Should I remove these from my model? The VIFs of these factors are all under 5.
1 Answers
Blair,
The reason why the final model includes terms with p-values above the customary threshold is that the function you used, step, uses a different criterion called the "AIC." AIC is a summary evaluation of the entire model at each stage, and one model may have a smaller (i.e., better) AIC even though it contains terms with higher p-values.
If you want to learn about AIC, there are a few ways to approach it. One is to see it as a penalized likelihood. Another is to view it from the lens of Information Theory.
The AIC-based sequential method is a competing (or perhaps complementary) algorithm to the one you were probably thinking of based on p-values. Either one has merits depending on the context. By the way, sequential selection is still an area of active research.
- 5,357
summary(model)? Stepwise methods should rightly work on the amount of variance (expressed in one of a number of ways) explained by an entire term - i.e. over all levels of a factor. Some levels may not be significant but one or more levels will be. However, what you can infer from the $t$ stats and their p-values in that summary output is limited owing to multiple testing (one per $t$) and, more importantly the inherent problems of stepwise procedures which render the $p$ values largely uninformative. – Gavin Simpson Jul 29 '14 at 04:26