1

When fitting linear models and coming up with a plausible one, AIC and VIF are often used. However, I notice that the order in which the methods are used makes a difference on the final model.

Should VIF via car::vif() be used before AIC via step(), or the other way around?

Details

I have a dataset with 13 categorical variables and one numeric variable. However, a single observation of the numeric variable is missing, so I'd like to impute its value using regression imputation (say, via the mice package). To do this, I need a plausible regression model. It stands to reason that many of the categorical predictors in my data exhibit collinearity, so variance inflation factors (VIFs) could be calculated to see which variables have VIF > 5, suggesting they should be excluded. Further, one could augment this with a stepwise regression using AIC. I tried both car::vif() followed by step(), then step() followed by car::vif(), but (unsurprisingly) get different answers. I'm wondering if anyone has a general strategy.

compbiostats
  • 1,557
  • 3
    car::vif produces variance inflation factors, which are not used to fit a model, but to diagnose collinearity.

    step is a stepwise method and probably shouldn't be used at all. There are many posts here about its dangers. Using it with AIC is slightly better, but still not great.

    – Peter Flom Aug 17 '23 at 21:38
  • Thanks. I know VIF and AIC are for distinct tasks, but as you state, there are better methods like regularization via LASSO that can account for collinearity among predictors. – compbiostats Aug 18 '23 at 00:55
  • 1
    What do you want to learn from your regression, and how does running either function work toward that goal? – Dave Aug 18 '23 at 02:56
  • @Dave I have added more detail to my post. – compbiostats Aug 18 '23 at 18:27

0 Answers0