14

In Hayashi's Econometrics, it is stated that one of the assumption of classical OLS is: $$\mathbb{E}(\epsilon_i\lvert\mathbf{x_1}, \mathbf{x_2}, \ldots, \mathbf{x_n}) = 0 \text{, for } i=1, \ldots, n. \tag{1}$$ And I know that the implications are that $\mathbb{E}(\epsilon_i) = 0$ for all $i = 1, \ldots,n$, and that the error term is uncorrelated with the regressors.

But, what does the equation (1) in itself actually mean? A pedagogical example would be helpful.

dwolfeu
  • 620
hans-t
  • 569
  • 2
  • 10
  • 18

1 Answers1

17

In English, it means that conditional on observing the data, the expectation of the error term is zero.

How might this be violated?

Example: omitted variable correlated with $x$

Imagine the true model is: $$ y_i = \alpha + \beta x_i + \gamma z_i + u_i$$

But instead imagine we're running the regression: $$ y_i = \alpha + \beta x_i + \underbrace{\epsilon_i}_{\gamma z_i + u_i}$$

Then: $$\begin{align*} E[\epsilon_i \mid x_i ] &= E[\gamma z_i + u_i \mid x_i] \\ &=\gamma E[ z_i\mid x_i] \quad \text{ assuming $u_i$ is white noise} \end{align*}$$

If $E[z_i \mid x_i] \neq 0$ and $\gamma \neq 0$, then $E[\epsilon_i \mid x_i] \neq 0$ and strict exogeneity is violated.

For example, imagine $y$ is wages, $x$ is an indicator for a college degree, and $z$ is some measure of ability. If wages are a function of both education and ability (the true data generating process is the first equation), and college graduates are expected to have higher ability ($E[z_i \mid x_i] \neq 0]$) because college tends to attract and admit higher ability students, then if one were to run a simple regression of wages on education, the strict exogeneity assumption would be violated. We have a classic confounding variable. Ability causes education, and ability affects wages, hence our expectation of the error in equation (2) given education isn't zero.

What would happen if we did run the regression? You would pickup both the education effect and the ability effect in the education coefficient. In this simple linear example, the estimated coefficient $b$ would pick up the effect of $x$ on $y$ plus the association of $x$ and $z$ times the effect of $z$ on $y$.

Matthew Gunn
  • 22,329
  • Hello Matthew, I think in the second-to-last paragraph "more able individuals are more likely to obtain a college degree" should be replaced by something along the lines of "college graduates are expected to be more able persons", as we are considering $E(z_i|x_i)$ and not $E(x_i|z_i)$. – Mitch Baker Apr 15 '18 at 14:58
  • 1
    @MitchBaker Thanks for the comment. Indeed $E[z|x]$ is most directly at issue. I've tried to clarify somewhat. – Matthew Gunn Apr 15 '18 at 16:10
  • One the one hand, $E(Y|X=x,Z=z)=\alpha+\beta x+\gamma z$. On the other hand, $E(Y|X=x)=\delta+\theta x$ with $\delta\neq\alpha$ and $\theta\neq\beta$ except for special cases. These are conditional mean models that have nothing to say about causality. In this context, the regression $y_i=\alpha+\beta x_i+\epsilon_i$ has misleading parameter names. The following analysis of the error term suffers from the same problem. Perhaps some $\text{do}(\cdot)$ calculus notation is in order to disentangle the conditional mean parameters from the causal ones. – Richard Hardy Mar 14 '23 at 18:26
  • @RichardHardy Yeah, to be more precise my hypothetical example should have said something like $u$ is independent and distributed $\mathcal{N}(0, \sigma^2)$. – Matthew Gunn Mar 14 '23 at 21:19
  • I am not sure that would help. The problem is that defining exogeneity only in terms of probability distributions and their derived features – without $\text{do}(\cdot)$ calculus or potential outcomes – gets hard to follow quite fast and probably does not get very far in the end. (I am reading Pearl "Exogeneity and Superexogeneity: A No-tear Perspective" (2000) where this case is made in just a few pages.) – Richard Hardy Mar 15 '23 at 09:00