I'm playing with some multiple linear regression models in r. After I run a regression, I use vif() to see if there is multicollinearity between my predictors. For the model with fixed effects for countries (factor(countryname)), vif() gives incredibly high results for some of the predictors. I would like to know why?
- 5,999
- 2
- 11
- 33
- 351
2 Answers
In my opinion, I wouldn't concern yourself with the variance inflation factors associated with the country fixed effects. The country-specific effects usually aren't of substantive interest; they're nuisance. In practice, we often have little hope of obtaining precise estimates on the country dummies, and your results may vary depending upon which country is the referent.
Technically, the vif() function in the car package is estimating generalized variance inflation factors (GVIFs). Instead of treating each of the $N - 1$ country effects separately, it estimates a "combined measure" of collinearity. In my experience, it is not uncommon to see wildly inflated GVIFs in settings with 150 countries. I wouldn't even calculate the GVIFs for the country dummies since they're considered as a "group" of predictors and not as separate country-specific intercepts.
The inflated GVIFs appear to be associated with your vector of covariates (e.g., GDP per capita, logged population, etc.), some of which do not appear to be of principal interest. If your set of controls aren't themselves collinear with the primary variable(s) of interest, then I wouldn't concern yourself with their GVIFs. In some scenarios you may find one or more covariates to be perfectly collinear with the country fixed effects. For instance, in shorter panels with smaller time units you may not observe any variation over time for some of your socio-demographic measures. I would imagine the within-country population growth is likely a sluggish variable (i.e., slow-moving), though still a sensible adjustment in my opinion.
I would also examine the standard errors associated with the key variable(s) of interest. Does the estimated uncertainty seem sensible even in the presence of the country fixed effects? So long as the GVIFs on the principal variable(s) of interest remain low, then I would be less concerned about high GVIF predictors, especially when the predictor list includes a full series of country effects.
- 5,999
- 2
- 11
- 33
I would add that if you include country fixed effects, and you have country-related variables in the data, it is logical that multicollinearity would increase. However, I think this article: https://statisticalhorizons.com/multicollinearity sums up pretty well when multicollinearity is an issue and when it isn't.
- 61
- 4
factor(countryname) 101363392.344484 58 1.172239.Factor(countryname)is fixed effects for countries. This is what I get when I usevif()for a model with fixed effects. – Ken Lee May 24 '21 at 18:39vif()from thecarpackage. The answer here should help. – Thomas Bilach May 24 '21 at 20:39vif()shows extremely high values for population when I include it, as you can see in my example. Hence, I'm afraid that I can't include population due to multicolinearity, even though I've seen papers in political science/economics including both population variable and country fixed effects. – Ken Lee May 25 '21 at 21:08