Measurement Error - Multivariate Case

Question

I have a linear regression model, with usual assumptions holding; $E[xu] = 0$ and rank condition.

$y_i = \alpha_0 + \alpha_1x_{1i} + \alpha_2x_{2i} + u_i$

I observe $\bar{x}_{2i}$, where:

$\bar{x}_{2i} = x_{2i} + e_i$

My estimated model is:

$y_i = \tilde{\alpha_o} + \tilde{\alpha_1}x_{1i} + \tilde{\alpha_2}x_{2i} + \tilde{u_i}$

I want to derive the plim of $\tilde{\alpha_1}$ and $\tilde{\alpha_2}$

My approach:

I substituted for my observation, and evaluated the following:

plim $\tilde{\alpha_2} = \frac{cov(\bar{x}_{2i}, y)}{var(\bar{x}_{2i})} = \frac{\alpha_{2}*var(x_{2i})}{var(x_{2i} + e_i)} = \frac{\alpha_{2}*var(x_{2i})}{var(x_{2i}) + var(e_i)}$

plim $\tilde{\alpha_1} = \frac{cov({x}_{1i}, y)}{var({x}_{1i})} = \frac{\alpha_{1}*var(x_{1i})}{var(x_{1i})} = \alpha_1$

Is this correct? I am a little worried about $\tilde{\alpha_1}$, because shouldn't this be biased?

score 3 · Accepted Answer · answered Sep 22 '22 at 15:30

To derive this you'll want to use the Frisch-Waugh-Lovell theorem.

Using the true variable, $x_2$, let $\widetilde{x_2}$ be the residual from a regression of $x_2$ on $x_1$,

$$x_2 = \delta_0 +\delta_1 x_1 +\widetilde{x_2}$$ We thus have,

$$\bar{x_2} = \delta_0 +\delta_1 x_1 +e +\widetilde{x_2}$$

The residual from a regression of $\bar{x_2}$ on $x_1$ is $(e +\widetilde{x_2})$.

By the Frisch-Waugh-Lovell theorem, the OLS estimate of the coefficient for $x_2$ in your model of estimation will be the same as the OLS estimate from

$$y_i = \alpha_0 +\alpha_2 (e_i +\widetilde{x_{2i}}) + u_i$$

So we have $$\hat{\alpha_2} = \frac{Cov(y_i, (e_i +\widetilde{x_{2i}}))}{Var(e_i +\widetilde{x_{2i}})}$$

You will plug in for $y_i = \alpha_0 +\alpha_{1i} x_1 +\alpha_2 x_{2i}+u_i$, and note that $Cov(x_{1i}, \widetilde{x_{2i}})=0$ because $\widetilde{x_{2i}}$ is the residual from an OLS regression with $x_1$ as a regressor. To proceed, you will need to make an assumption regarding $Cov(x_{1i}, e_i))$ and $Cov(u_{i}, e_i))$.

To estimate the effect of $x_1$ on $y$, we need to consider a regression of $x_1$ on $x_2$.

$$x_1 = a_0 +a_1 x_2 + \widetilde{x_1}$$

Using the mismeasured version of $x_2$,

$$x_1 = a_0 +a_1 \bar{x_2} - a_1e + \widetilde{x_1} $$

The residual is $(- a_1e + \widetilde{x_1})$.

We apply the Frisch-Waugh-Lovell theorem to know the estimate for the coefficient of $x_1$ is the same as the estimate from, $$y_i = \alpha_0 +\alpha_1 (- a_1e + \widetilde{x_1}) + u_i$$

This is $$\hat{\alpha_1} = \frac{Cov(y_i, (- a_1e + \widetilde{x_1}))}{Var(- a_1e + \widetilde{x_1})}$$

Analogous to before, you will plug in for $y_i = \alpha_0 +\alpha_{1i} x_1 +\alpha_2 x_{2i}+u_i$, and note that $Cov(x_{2i}, \widetilde{x_{1i}})=0$ because $\widetilde{x_{1i}}$ is the residual from an OLS regression with $x_2$ as a regressor. To proceed, you will need to make an assumption regarding $Cov(x_{1i}, e_i))$ and $Cov(u_{i}, e_i))$.

Thanks for your answer. For some reason I am unable to comment. What about: $Cov(x_{1i}, \widetilde{x_{1i}})$? — Wooldridge, Sep 25 '22 at 08:57
Plug in that $x_1 = a_0 +a_1 \bar{x_2} -a_1e +\widetilde{x_1}$, then derive. — Michael Gmeiner, Sep 26 '22 at 07:29

Measurement Error - Multivariate Case

1 Answers1