Omitted variable problem

Question

I'm studying the cases in which the endogeneity problem arises in OLS regression.

Suppose we have the following population equation:

$y=\beta_0 +\beta_1 x_1 + ... + \beta_k x_k + \gamma q + \epsilon$

and say $E(\epsilon | x,q)=0$, such that: $E(y|x,q)=\beta_0 +\beta_1 x_1 + ... + \beta_k x_k + \gamma q$

Suppose $q$ is unobserved and so it goes into the error term, thus your population equation reads as

$y=\beta_0 +\beta_1 x_1 + ... + \beta_k x_k + \nu$ , where $\nu=\gamma q + \epsilon$

Then, the slides says, nothing is lost assuming that $E(q)=0$, because an intercept is included in the basic equation, so that $E(\nu)=0$.

Why is fine assuming that $E(q)=0$, because an intercept is included in the basic equation?

EdM · Accepted Answer · 2023-03-24T15:45:17.423

Start with

$$y=\beta_0 +\beta_1 x_1 + ... + \beta_k x_k + \gamma q +\epsilon.$$

Say that the mean value of $q$ is $\bar q$. Then centering $q$ around its mean gives $q_c=q-\bar q$. Substitute into the above and collect constant terms:

$$y=(\beta_0 + \gamma \bar q)+\beta_1 x_1 + ... + \beta_k x_k + \gamma q_c + \epsilon.$$

Any offset of the unobserved $q$ in this situation will be included in the intercept of a model that's based on the observed predictors. It won't affect the estimates of the coefficients for the observed predictors $x_i$, or the bias in the coefficient for any $x_i$ correlated with the unobserved $q$.

Two warnings. First, omitting an intercept in such a model will lead to problems. Second, omitted-variable bias can be more of a problem in other types of models, as explained here for a probit model. In OLS there is no bias in the coefficient for an observed predictor uncorrelated with the unobserved predictor. In models without an error term like $\epsilon$ in OLS to capture excess heterogeneity resulting from $q$, an unobserved/unmodeled predictor can lead to bias in coefficients for all included predictors.

Omitted variable problem

1 Answers1