4

I've been trying to read up on multicollinearity, and I think I have a decent grasp of it, and of what VIF tells me. But there is one aspect of the advice that seems quite universal, but makes me worry that I've misunderstood something. I think I might be being too first order with my thinking.

So, as I understand it, the VIF represents the inflation in the variance of the parameter caused by multicollinearity. My guess (though I haven't seen this written anywhere) was that that would mean, for a population with given slope and spread around the line, we would need VIF-times as many participants (on average) to counteract the multicollinearity and restore the t-statistic to where it would otherwise have been, and so to distinguish it from a zero-slope null...

But, if that is the case, then the standard guidelines of "worry when VIF goes above 10" seems crazy to me (do we only worry when we'd need an order-of-magnitude-larger sample size?). It seems the mostly likely cause of this mismatch is that I have mis-thought this through. But what am I missing?

Thank you!

justme
  • 775

1 Answers1

5

Generally, your understanding of variance inflation factors seem to be quite correct (except that a parameter is not the same as a parameter estimate).

There is no reason why a limit of 10 is better than any other limit.

E.g. if in a large, randomized clinical trial with two treatments, the treatment effect has a VIF of 3, I would really be worried about the randomization procedure.

If and where you choose your cut-off depends on the purpose of your analysis. If e.g. the purpose is prediction, then VIFs are irrelevant as long as the matrix inversions remain stable.

Michael M
  • 11,815
  • 5
  • 33
  • 50
  • 1
    That helps a lot, thank you! -- and yes I should have said the variance of the parameter estimate! – justme Apr 21 '16 at 13:05