2

Suppose the population regression function is as follows:

$$y=\beta_{0}+\beta_{1}x_{1}+\epsilon$$

In this case, the assumptions of the linear model for obtain unbiased and consistent estimates are satisfied, namely that $E[\epsilon| x_{1}]$=0.

However, we mistakenly estimate instead

$$y=\gamma_{0}+\gamma_{1}x_{1}+\gamma_{2}x_{2}+\epsilon$$

wherein $x_{2}$ itself is irrelevant in a causal sense, and is itself an outcome of $x_{1}$(the so-called bad control problem). Is it possible then, that the conditional mean independence fails now? In other words, is it possible that $E[\epsilon|x_{1},x_{2}]\neq0$ even though $E[\epsilon|x_{1}]=0?$ In other words, have we introduce endogeneity by including an irrelevant variable? If so, why? Wouldn't that correlation with the error term already exist?

ChinG
  • 885

1 Answers1

1

We can find an example such that $E(\epsilon|x_1) = 0$ but $E(\epsilon|x_1,x_2) \ne 0$.

Suppose that $(\epsilon,x_1,x_2)$ take the following values with the same probability of $1/4$

$$ (-1,0,0), \; (1,0,1), \; (-1,1,0), \; (1,1,1) $$

We have that $E(\epsilon |x_1) = 0$ since $$E(\epsilon | x_1 = 1) = E(\epsilon | x_1 = 0) = -1 \cdot \frac{1}{2} + 1\cdot\frac{1}{2} = 0,$$ but $E(\epsilon|x_1,x_2) \ne 0$ because for example $$ E(\epsilon|x_1 = 0,x_2 = 0) = -1 $$

pdb
  • 76
  • 7