2

We know that due to multi-collinearity, the standard errors of beta estimates get inflated. But what is the mathematical basis to it?

I am looking for some mathematical relationship or something to explain this.

Like I understand if standard error of betas goes up, then t-statistics goes down and we might not be able to reject the null or the variables would appear non-significant.

But what is the mathematical relationship between multicolllinearity and inflation in variance of coefficients?

Baktaawar
  • 1,105
  • 1
    When some linear combinations of predictors (including the constant) are almost zero, the variance of a coefficient for that combination will have extremely large variance (there's little-to-no information in the data about it). That "projects" onto the coefficients in your model ... every variable that appears in that poorly-determined linear combination will 'inherit' some of that indeterminacy (i.e. get a large variance for its estimated coefficient). – Glen_b May 03 '15 at 03:16

1 Answers1

2

Take a look at The Analysis of Market Demand (JSTOR) by Richard Stone, Journal of the Royal Statistical Society, Vol. 108, No. 3/4 (1945), pp. 286-391. I can't find an ungated link, so here's the main result.

He gives a formula for the estimated variance of OLS regressor $\beta_k$ in a regression of $y$ on $K$ variables as $$ \frac{1}{N-K}\cdot\frac{\sigma^2_y}{\sigma^2_k}\cdot\frac{1-R^2}{1-R^2_k}, $$ where $\sigma^2_y$ is the estimated variance of $y$, $\sigma^2_k$ is the estimated variance of $x_k$, $R_k^2$ is from the regression of $x_k$ on $K-1$ remaining independent variables, and $N$ is the sample size. The set of $K$ already includes a constant.

As the independent variables get more collinear, $R^2_k$ approaches one, so the variance blows up.

dimitriy
  • 35,430
  • 1
    The specific value $1/(1-R^2_k)$ is called the variance inflation factor. It's 1 only when $x_k$ is orthogonal to all other predictors in the model. – AdamO Nov 21 '18 at 16:01