When fitting linear models and coming up with a plausible one, AIC and VIF are often used. However, I notice that the order in which the methods are used makes a difference on the final model.
Should VIF via car::vif() be used before AIC via step(), or the other way around?
Details
I have a dataset with 13 categorical variables and one numeric variable. However, a single observation of the numeric variable is missing, so I'd like to impute its value using regression imputation (say, via the mice package). To do this, I need a plausible regression model. It stands to reason that many of the categorical predictors in my data exhibit collinearity, so variance inflation factors (VIFs) could be calculated to see which variables have VIF > 5, suggesting they should be excluded. Further, one could augment this with a stepwise regression using AIC. I tried both car::vif() followed by step(), then step() followed by car::vif(), but (unsurprisingly) get different answers. I'm wondering if anyone has a general strategy.
step is a stepwise method and probably shouldn't be used at all. There are many posts here about its dangers. Using it with AIC is slightly better, but still not great.
– Peter Flom Aug 17 '23 at 21:38