Questions tagged [multicollinearity]

Situation when there is strong linear relationship among predictor variables, so that their correlation matrix becomes (almost) singular. This "ill condition" makes it hard to determine the unique role each of the predictors is playing: estimation problems arise and standard errors are increased. Bivariately very high correlated predictors are one example of multicollinearity.

Multicollinearity refers to when predictor variables are (linearly) correlated with each other. Although the term is sometimes used to mean perfectly correlated (i.e., $r=1$) only, it is more often used to simply mean strongly correlated. Multicollinearity need not be manifested in bivariate correlations; a variable could be correlated with several other variables such that all bivariate correlations are low.

Conceptually, the existence of multicollinearity means that it is difficult to determine the role each of the correlated variables is playing. Mathematically, it manifests in larger standard errors. Thus, collinearity reduces statistical power.

Multicollinearity can produce counter-intuitive phenomena. For example, when a collinear variable is added or dropped from a model, other variables can switch between significance and non-significance, and / or the sign of their relationship with the response can switch between positive and negative.

Additionally, when there is multicollinearity, small changes in the data can lead to large changes in the parameter estimates, even reversals of sign.

Detecting and addressing multicollinearity is an important topic in multivariable statistical modeling. Two common methods of detecting multicollinearity are variance inflation factors (VIFs) and condition indexes, the latter are preferred by Belsely (see references) in his seminal book on multicollinearity.

References Belsley, D. A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley.

1208 questions
25
votes
1 answer

Is there a reason to prefer a specific measure of multicollinearity?

When working with many input variables, we are often concerned about multicollinearity. There are a number of measures of multicollinearity that are used to detect, think about, and / or communicate multicollinearity. Some common recommendations…
13
votes
2 answers

Dealing with multicollinearity

I have learnt that using vif() method of car package, we can compute the degree of multicollinearity of inputs in a model. From wikipedia, if the vif value is greater than 5 then we can consider that the input is suffering from multicollinearity…
samarasa
  • 1,467
12
votes
1 answer

Standardization of variables and collinearity

Collinearity can pose certain problems in various kinds of regression problem. In particular, it can make the parameter estimates have high variance and be unstable. Various methods have been proposed to deal with this including ridge regression,…
Peter Flom
  • 119,535
  • 36
  • 175
  • 383
4
votes
1 answer

Collinearity testing between predictors

I would like to test a collinearity between possible "predictors (risk factors)" for binary outcome (death). Possible "predictors" are categorical (always binary) and continuous... For two continuous (age, weighth, etc.), I can use a bivariate…
Juraj
  • 113
  • 3
  • 9
4
votes
1 answer

Why are VIFs below ten not still considered very worrying?

I've been trying to read up on multicollinearity, and I think I have a decent grasp of it, and of what VIF tells me. But there is one aspect of the advice that seems quite universal, but makes me worry that I've misunderstood something. I think I…
justme
  • 775
3
votes
1 answer

high variance proportion for intercept

I'm using the ols_eigen_cindex function to assess multicollinearity. With these variance proportions: model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars) ols_eigen_cindex(model) Eigenvalue Condition Index intercept disp hp…
locus
  • 1,593
3
votes
1 answer

How does multicollinearity between some variables affect non-collinear ones?

In a multiple regression model, i.e. $y \sim x_1 + x_2 + x_3$, where $x_1$ and $x_2$ are collinear (e.g. present high correlation around 0.8), it is well known that many problems arise regarding parameter estimation or hypothesis testing (on $x_1$…
3
votes
1 answer

How to Deal with Multicollinearity?

I use cross-sectional macroeconomics variables with OLS. I found that my data suffers from multicolinearity and i am looking for solutions. I read about first differences of the variables and i tried to do it. However, using first differences the…
Ant
  • 337
3
votes
1 answer

perfect collinearity among multiple continuous variables

When there is a perfect collinearity among more than two continuous variables, how do you deal with it and how are the regression results interpreted? I have three independent variables which represent the percentage of different races within…
2
votes
0 answers

About multicolinearity in Interaction Terms

Dears, In my panel data analysis, I am using interaction term. The model looks something like Y=b0+b1*X+b2*Z+ b3*XZ+e. Now the interaction term is strongly correlated with X or Z. In order to get around this problem, I read in Azman et al. (2010)…
2
votes
1 answer

Multicollinearity diagnostics: how are the eigenvalues calculated?

As a measure of multicollinearity, some statistical packages, like SPSS and SAS, give you eigenvalues. See the image for an example output of SPSS (simulated data, two predictors). What I would like to know is how these eigenvalues are calculated.…
2
votes
0 answers

multicollinearity question, X and X change score

Consider survey data from surgeries. $Y$ represents observed surgical quality and is measured post-surgery; $X$ represents perceived surgical difficulty level and is measured pre and post surgery. It is desired to assess the relationship between…
1
vote
5 answers

A question about regression with highly correlated variables

I have three independent continuous variables which are highly correlated. The dependent variable is a continuous one. In this case, should I apply three separate regression between each independent variable and the dependent variable since the…
joe
  • 87
1
vote
2 answers

Testing for multicollinearity with squared variables

I'm having some problems understanding when multicollinearity is acceptable and when it becomes a problem. Can someone give me some insight on what should I focus on? Here's an example of what I'm struggling with.
kaba100
  • 11
1
vote
0 answers

How to deal with multicollinearity in a Translog Production Function

What are the different methods for dealing with multicollinearity in a translog production function? I have seen several methods such as: Checking for variance inflation factor (VIF) and ensuring that it is less than 10 therefore, if VIF > 10,…
1
2