Adding a new insignificant coefficient to the regression, that is collinear

Question

Problem

This is more of a theoretical question than a practical one...

Let's say I have three Random Variables: $Y$, $X_1$, and $X_2$. With the following properties:

Cor($Y$,$X_1$) > 0
Cor($Y$,$X_2$) = 0
Cor($X_1$, $X_2$) > 0

You run two regressions:

$$ Y = \alpha * X_1 + \epsilon $$

and:

$$ Y = \beta_1 * X_1 + \beta_2 * X_2 + \epsilon $$

Which is larger: $\alpha$ or $\beta_1$?

Approach 1: Dumb One

My gut instinct says the following, if the formula for the coefficient of a simple OLS, $\alpha$ is usually:

$$\tag{1} \alpha = \frac{Cov(X_1,Y)}{Var(X_1)}$$

We now have a multiple regression beta when we include $X_2$:

$$\tag{2} \hat{\beta}_{X_1+X_2} = \frac{Cov(X_1+X_2,Y)}{Var(X_1+X_2)}$$

Then the numerator doesn't change: $Cov(X_1+X_2,Y) = Cov(X_1,Y) + Cov(X_2,Y) = Cov(X_1, Y)$

The denominator changes because the variance now becomes: $Var(X_1) + Var(X_2) + 2 Cov(X_1,X_2)$

Which, because $Cov(X_1, X_2) > 0$, it means that we have this inequality:

$$Var(X_1) + Var(X_2) + 2 Cov(X_1,X_2) > Var(X_1)$$

Intuitively, $\beta_1$ is less stable so it will be larger because of the instability.

Approach 2: Google It.

I thought I'd check my answer and found a formula for $\beta_i$ along the lines of Equation (1) that takes into account the correlation between coefficients. It's from this website (written once in their notation, and another in mine):

$$ \tag{3} \beta_1 = \frac{r_{1y} - r_{2y}r_{12}}{1 - r_{12}^2} $$

$$ \beta_1 = \frac{Cor(X_1,Y) - Cor(X_2,Y) * Cor(X_1,X_2)}{1-Cor(X_1,X_2)^2} $$

Their notation is cleaner. So I'll stick with using it: $Cov(X_1,Y) = s_{1y}$, $Cor(X_1,Y)=r_{1y}$, $SD(X) = s_x$, and $Var(X) = s_x^2$

Now I rewrite Equation (1) in terms of correlations, in their notation:

$$ \tag{1b} \alpha = \frac{s_{1y}}{s_x^2} = r_{1y} \frac{s_y}{s_x}$$

So comparing Equation (3) with Equation (1b), we have:

$$ \alpha > \beta_1 $$

$$ r_{1y} \frac{s_y}{s_x} > \frac{r_{1y} - r_{2y}r_{12}}{1 - r_{12}^2} $$

if and only if:

$$ \frac{1}{1-r_{12}^2} > \frac{s_y}{s_x} $$

$$ \tag{4} 1 - r_{12}^2 < \frac{s_x}{s_y} $$

Questions

1 - Are Approach 2 and Equation (4) correct? The answer is it depends? I kinda thought it was cut and dry and the answer would either be larger or smaller

2 - I'm not sure that Equation (2) makes sense... what does the coefficient of a sum of two random variables even mean? Am I writing that correctly?

3 - most importantly: how do I derive something like Equation (3)? I don't want to use Google every time I'm stuck on problems like these (this is for personal practice)

Edit: Approach 3 - Frisch-Waugh-Lovell Theorem

Can't I also use the FWL Theoerem to do:

$$ resid_{Y \sim X_2} = Y - \alpha_2 * X_2 $$

$$ = Y - \frac{Cov(X_2,Y)}{Var(X_2)} * X_2 $$ $$ = Y - 0 $$ $$ resid_{Y \sim X_2} = Y $$

And then run the regression of:

$$ resid_{Y \sim X_2} \sim X_1 = T \sim X_1 $$

...which is just means that $\beta_1 = \alpha$?

I'm so lost now, I feel like I've solved the answer in three different ways and they all sound reasonable enough...

In your second approach you involve the variances $s$ when computing $\alpha$ but not when computing $\beta$ you should end up with $$\alpha = r_{1y} < \frac{r_{1y}}{1 - r_{12}^2} =\beta$$ when you use $s_y = s_{x_1} = 1$ and $r_{2y} = 0$. — Sextus Empiricus, Jun 22 '23 at 07:24
Also useful can be Intuition behind $(X^TX)^{-1}$ in closed form of w in Linear Regression. This you may also relate to FWL theorem which is about the way that the coefficients transform due to correlations between the different $X_i$. (Your application of it in your third approach is not very clear so I can not comment much more on it) — Sextus Empiricus, Jun 22 '23 at 07:34
In your first approach it is not true that the numerator doesn't change. You can have Cov(X_1+X_2,Y) ≠ Cov(X_2, Y) — Sextus Empiricus, Jun 22 '23 at 07:39
My first criticism on your first approach was not correct. Indeed we have $$Cov(X_1+X_2,Y) = Cov(X_1,Y) + Cov(X_2,Y)$$ — Sextus Empiricus, Jun 22 '23 at 15:31
In your first approach you are using $$Var(X_1) + Var(X_2) + 2 Cov(X_1,X_2)$$ however it may not need to be a linear sum like that. For example, in the example from my question you will use a negative coefficient in the linear sum leading to $$Var(X_1-X_2) =Var(X_1) + Var(X_2) - 2 Cov(X_1,X_2)$$ and the denominator will be smaller (making the coefficient larger) — Sextus Empiricus, Jun 22 '23 at 15:43
Isn't the premise flawed because if cor(y,x1)>0 and cor(x1,x2)>0 there must be some correlation between y and x2 as well? Yes maybe it isn't "statistically significant" but there must be a nonzero correlation there, right? — qdread, Jun 22 '23 at 19:05
@qdread consider $$X = \begin{bmatrix} 1 & 1 \ 1 & 0 \ -1 & 0 \ -1 & -1 \ \end{bmatrix} \qquad Y = \begin{bmatrix} 0 \ 1 \ -1 \ 0\end{bmatrix}$$ — Sextus Empiricus, Jun 22 '23 at 19:18

Sextus Empiricus · Accepted Answer · 2023-06-22T07:14:49.263

5

Consider the matrix equation for finding the coefficients

$$\hat\beta = (X^TX)^{-1} X^T Y$$

If you assume the variables to have variance equal to one and centered at zero then you can express this in terms of the correlations

$$\begin{bmatrix}\hat\beta_1 \\ \hat\beta_2 \end{bmatrix} = \begin{bmatrix} 1 & \rho_{X_1,X_2} \\ \rho_{X_1,X_2} & 1\end{bmatrix}^{-1} \cdot \begin{bmatrix} \rho_{X_1,Y}\\ \rho_{X_2,Y}\end{bmatrix} = \frac{1}{1-\rho_{X_1,X_2}^2}\begin{bmatrix} 1 & -\rho_{X_1,X_2} \\ -\rho_{X_1,X_2} & 1\end{bmatrix} \cdot \begin{bmatrix} \rho_{X_1,Y}\\ \rho_{X_2,Y}\end{bmatrix} $$

And for the coefficient $\beta_1$ you get

$$\hat\beta_1 = \frac{\rho_{X_1,Y}}{1-\rho_{X_1,X_2}^2} > \rho_{X_1,Y} = \hat\alpha$$

Example

$$X = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ -1 & 0 \\ -1 & -1 \\ \end{bmatrix} \qquad Y = \begin{bmatrix} 0 \\ 1 \\ -1 \\ 0\end{bmatrix} $$

Will give $\alpha = 0.5$ and $\beta_1 = 1$

Intuitively: the $X_1$ variable correlates with $Y$ but also 'brings in some noise'. The $X_2$ variable, that doesn't directly correlate with $Y$ but does correlate with $X_1$ can cancel some of this noise, allowing the $X_1$ variable to fit better (and that increases the coefficient).

Say that we have two uncorrelated variables $Z$, $Y$.

Then $X_1 = Y$ would fit the model with a coefficient equal to one.
But $X_1 = Y + Z$ will be related with a coefficient less than one.

By adding a variable in the regression $X_2 = Z$ we can cancel some of the noise that 'prevents' $X_1$ form being fitted with a coefficient equal to one.

edited Jun 22 '23 at 07:14

answered Jun 22 '23 at 07:09

Sextus Empiricus

77,915

How do you get the matrices: $\begin{bmatrix} 1 & \rho_{X_1,X_2} \ \rho_{X_1,X_2} & 1\end{bmatrix}^{-1} \cdot \begin{bmatrix} \rho_{X_1,Y}\ \rho_{X_2,Y}\end{bmatrix}$ in your solution? You lost me at that jump from the beta vector to the two matrices – JoeVictor Jun 22 '23 at 15:52
@JoeVictor you get those matrices for standardized $X$ and $Y$. Such that $X^TX$ is the correlation matrix. In my example the variables are not standardized. – Sextus Empiricus Jun 22 '23 at 15:55
if the variables are not standardized, wouldn't that change the answer? – JoeVictor Jun 22 '23 at 15:58
2

@JoeVictor not when they remain centered, in that case the correlation and covariance are just a difference in the scale of the variables. But for a shift the result is different. – Sextus Empiricus Jun 22 '23 at 16:07
got it, thanks! – JoeVictor Jun 22 '23 at 16:14
1

@JoeVictor an example is when we add to both the vectors $X_2$ and $Y$ some fixed value, say 10. Then the covariance/correlation is not influenced, but the product $X_2 \cdot Y$ changes a lot. – Sextus Empiricus Jun 22 '23 at 16:31