2

Given the estimator for $\hat{\beta}_1$ in the Frisch-Waugh-Lovell Theorem:

$$\hat{\beta}_1 = \left(X_1^\prime{} M_2 X_1\right)^{-1} X_1' M_2 y$$

I can arbitrarily use either the $y$ in the auxiliary regression or the residual of a regression of $y$ on $X_2$.

This can be seen from (just using the fact that $M_2$ is symmetric and idempotent):

(1) $$\hat{\beta}_1 = \left(X_1^\prime{} M_2^\prime{} M_2 X_1\right)^{-1} X_1' M_2' y$$

which is equal to:

$$\hat{\beta}_1 = \left(\tilde{X}^\prime{} \tilde{X}\right)^{-1} \tilde{X}^\prime{} y$$

where $\tilde{X}$ is the first stage, regressing $X_1$ on $X_2$ and using the residuals for the second stage.

And equally:

(2) $$\hat{\beta}_1= \left(X_1^\prime{} M_2^\prime{} M_2 X_1\right)^{-1} X_1^\prime{} M_2^\prime{} M_2 y$$

which is equal to:

$$\hat\beta_1 = \left(\tilde{X}^\prime{} \tilde{X}\right)^{-1} \tilde{X}^\prime{} \tilde{Y}$$

where $\tilde{X}$ is the first stage, regressing $X_1$ on $X_2$ and using the residuals for the second stage. But now in the second stage we use the residuals of $y$ on $X_2$.

So far so good. Now the theorem states, that if we are using (2) we get residuals that are identical to the residuals in the unpartitioned regression. Whereas when we are using (1) we get different residuals. How do we prove that?

1 Answers1

1

That the residuals from (2) and the full regression are identical can be seen as follows. Let $e$ denote the residuals of a regression of $y$ on $X$, i.e. $$y=X_1b_1+X_2b_2+e$$ Then, multiplying through with $M_{X_2}$ yields the regression of the residuals on each other, $$M_{X_2}y=M_{X_2}X_1b_1+M_{X_2}e,$$ where we have used $$M_{X_2}X_2=0$$ BY FWL, $b_1$ is the estimated coefficient vector of this regression, so that $M_{X_2}e$ are the residuals. Now, since residuals and regressors are orthogonal, $X_2'e=0$, thus $$M_{X_2}e=e$$

Now, the alternative procedure (1) in which $y$ is not residualized yields residuals $$ y-M_{X_2}X_1b_1 $$ rather than residuals $$ e=M_{X_2}y-M_{X_2}X_1b_1, $$ as in procedure (2). Using $M+P=I$ for residual maker and projection matrices, write the residuals of (1) as $$ M_{X_2}y+P_{X_2}y-M_{X_2}X_1b_1=e+P_{X_2}y=e+X_2b_2, $$ with $P_{X_2}$ the projection matrix on $X_2$. And since there is, unless $b_2=0$ (i.e., we might have omitted $X_2$ anyhow), generally no reason to suppose that $P_{X_2}y=0$, the residuals will differ.

Illustration:

n <- 10
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- rnorm(n)

resid.firststage.x1 <- resid(lm(x1~x2-1))

e <- resid(lm(y~x1+x2-1)) e.fwl <- resid(lm(resid(lm(y~x2-1))~resid.firststage.x1-1))

e.fwlpartial <- resid(lm(y~resid.firststage.x1-1))

all.equal(e, e.fwl, check.attributes = F) all.equal(e.fwlpartial, as.numeric(e + x2%%solve(crossprod(x2))%%t(x2)%*%y), check.attributes = F)

  • Why would the alternative procedure (1) amount to residuals $ y - M_{x_2} X_1 b_1 $? Don't you multiply it with $M_{x_2}$? – Marlon Brando Dec 22 '23 at 13:47
  • Just from the definition of residuals, defined as the difference between the dependent variable and the regressors times estimated coefficients. If you do not residualize $y$, then $y$ is also your dependent variable in the final FWL step. If you did multiply it with $M_{X_2}$ there would be no difference to (2) left. – Christoph Hanck Dec 22 '23 at 13:57