1

enter image description here

Date, age, mrt and shops are all predictors in a dataset of 414 observations. Pearson's product-moment correlation shows a sizeable negative correlation between mrt and shops (-0.6 so definitely higher than the minimum benchmark of 2/sqrt(n)). Yet the VIF for both is quite low. Does this mean there is multicollinearity or not? And why is the VIF so low if Pearson's r is -0.6?

Ps: I have found a similar question here, but there Pearson's r is not negative and that might mean a difference. Any help would be much appreciated.

1 Answers1

2

This is largely covered elsewhere, e.g., in my answer to When can we speak of collinearity. Whether Pearson's $r$ is positive or negative makes no difference.

I have never heard of your "minimum benchmark", and it doesn't make any sense to me. Consider that if you only had $4$ data, I gather your minimum benchmark would say that a pairwise correlation between variables equal to $r = 1.0$ would be fine (i.e., $2/\sqrt{4} = 1$), whereas if you had $1600$ data, any $r>.05$ would be problematic (i.e., $2/\sqrt{1600} = .05$). I may be misunderstanding it, but that's nonsensical. Consider that, unless you have perfect multicollinearity, the primary impact is a reduction of power but that can still be overcome with sufficient $N$ (cf., my answer to: What is the effect of having correlated predictors in a multiple regression model?).

By (arbitrary) rule of thumb, you have a 'problem with multicollinearity' when you have a ${\rm VIF} \ge 10$. With respect to pairwise correlations alone, that would imply $|r| \gtrapprox .95$.

Dave
  • 62,186
  • Thank you very much for giving me links to other answers and still taking the trouble to answer my question - much appreciated! As regards the "minimum benchmark" it is from Newbold-Carson-Thorne: Statistics for Business and Economics (ISBN 9780273767060), eighth edition, page 84, bottom of the page. "A useful rule to remember is that a relationship exists if |r| >= 2/sqrt(n)" – Reader 123 Sep 02 '21 at 17:32
  • 1
    @Reader123, Oh, I see. They're giving a rule of thumb for eyeballing the statistical significance of a correlation. That doesn't matter. With respect to potential multicollinearity, it doesn't matter whether the correlation is 'real' in the population or not. Multicollinearity is about the correlations in your sample being 'too high'. – gung - Reinstate Monica Sep 02 '21 at 20:32
  • Thank you, @gung, I completely missed the aspect of one being about the population and the other about the sample only... :-| Thanks for your help. – Reader 123 Sep 03 '21 at 09:34
  • You're welcome, @Reader123. – gung - Reinstate Monica Sep 03 '21 at 13:21