9

In another question of mine, an answerer used the following derivation of OLS coefficient:

We have a model: $$ Y = X_1 \beta + X_2 \beta_2 + Z \gamma + \varepsilon, $$ where $Z$ is unobserved. Then we have: $$\text{plim}\, \hat \beta_{1} = \beta_1 + \gamma \frac{Cov(X_1^*, Z)}{Var(X_1^*)} = \beta_1, $$ where $X_1^* = M_2 X_1$ and $M_2 = [I - X_2(X_2'X_2)^{-1}X_2']$.

This looks different from the usual $\beta = (X'X)^{-1}X'Y$ that I've seen in Econometrics. Is there a more explicit exposition of this derivation? Is there a name for the $M_2$ matrix?

Heisenberg
  • 714
  • 2
  • 6
  • 14
  • Im pretty sure its described in Hansen's lecture notes, but I don't have them at my hands right now. – FooBar Jan 28 '15 at 20:13

1 Answers1

10

The $\mathbf M = \mathbf I-\mathbf X(\mathbf X'\mathbf X)^{-1}\mathbf X'$ matrix is the "annihilator" or "residual maker" matrix associated with matrix $\mathbf X$. It is called "annihilator" because $\mathbf M\mathbf X =0$ (for its own $X$ matrix of course). Is is called "residual maker" because $\mathbf M \mathbf y =\mathbf {\hat e}$, in the regression $\mathbf y = \mathbf X \beta + \mathbf e$.

It is a symmetric and idempotent matrix. It is used in the proof of the Gauss-Markov theorem.

Also, it is used in the Frisch–Waugh–Lovell theorem, from which one can obtain results for the "partitioned regression", that says that in the model (in matrix form)

$$\mathbf y = \mathbf X_1\beta_1 + \mathbf X_2\beta_2 + \mathbf u$$

we have that

$$\hat \beta_1 = (\mathbf X_1'\mathbf M_2\mathbf X_1)^{-1}(\mathbf X_1'\mathbf M_2)\mathbf y $$

Since $\mathbf M_2$ is idempotent we can re-write the above by

$$\hat \beta_1 = (\mathbf X_1'\mathbf M_2\mathbf M_2\mathbf X_1)^{-1}(\mathbf X_1'\mathbf M_2\mathbf M_2)\mathbf y$$

and since $M_2$ is also symmetric we have

$$\hat \beta_1 = ([\mathbf M_2\mathbf X_1]'[\mathbf M_2\mathbf X_1])^{-1}([\mathbf M_2\mathbf X_1]'[\mathbf M_2\mathbf y]$$

But this is the least-squares estimator from the model

$$[\mathbf M_2\mathbf y] = [\mathbf M_2\mathbf X_1]\beta_1 + \mathbf M_2\mathbf u$$

and also $\mathbf M_2\mathbf y$ are the residuals from regressing $\mathbf y$ on the matrix $\mathbf X_2$ only.

In other words: 1) If we regress $\mathbf y$ on the matrix $\mathbf X_2$ only, and then regress the residuals from this estimation on the matrix $\mathbf M_2\mathbf X_1$ only, the $\hat \beta_1$ estimates we will obtain will be mathematically equal to the estimates we will obtain if we regress $\mathbf y$ on both $\mathbf X_1$ and $\mathbf X_2$ together at the same time, as a usual multiple regression.

Now, assume that $\mathbf X_1$ is not a matrix but just one regressor, say $\mathbf x_1$. Then $\mathbf M_2 \mathbf x_1$ is the residuals from regressing the variable $X_1$ on the regressor matrix $\mathbf X_2$. And this provides the intuition here: $\hat \beta_1$ gives us the effect that "the part of $X_1$ that is unexplained by $\mathbf X_2$" has on "the part of $Y$ that is left unexplained by $\mathbf X_2$".

This is an emblematic part of classic Least-Squares Algebra.

Giskard
  • 29,387
  • 11
  • 45
  • 76
Alecos Papadopoulos
  • 33,814
  • 1
  • 48
  • 116
  • Started answering, but I had a lot of overlap with this answer. You can find a lot of this information in Chapter 3.2.4 of the 7th edition of "Econometric Analysis" by Bill Greene. – cc7768 Jan 28 '15 at 20:58
  • @cc7768 Yes, that's a good source for least-squares algebra. But don't hesitate to post additional material. For example, essentially my answer covers only the second question of the OP. – Alecos Papadopoulos Jan 28 '15 at 21:01
  • @AlecosPapadopoulos you say that if we regress $\mathbf M_2y$ on $\mathbf X_1$, we also get $\hat \beta_1$. But isn't the equation saying, regress $\mathbf M_2y$ on $\mathbf M_2\mathbf X_1$ instead? – Heisenberg Jan 30 '15 at 06:29
  • @Heisenberg Correct. Typo. Fixed it, and added a bit more. – Alecos Papadopoulos Jan 30 '15 at 07:02