1

Consider the linear regression model: $$y_{i}=x_{i}'\beta+\epsilon_{i}$$ where the notation is conventional. For OLS to be unbiased, we need the conditional exogeneity assumption, or the fact that $\text{E} [\epsilon \mid x]=0$

I understand if the conditional mean of the error term is a function of x then we run into endogeneity problems. However, what if it is a constant? For instance, what if it is

$$ \text{E} [\epsilon \mid x] = c . $$

I don't see a problem with this because the error term is not systematically varying with $x$, but is fixed. Therefore, when we use variation in $x$ to compute parameter estimates, this should get 'differenced' away. Is this correct?

dsaxton
  • 12,138
  • 1
  • 26
  • 48
ChinG
  • 885
  • 2
    Perhaps, but it's not very useful to think this way. We are free to define errors as deviations from means and if $\text{E}(\epsilon) \neq 0$ then $\epsilon$ isn't really an error term anymore because its mean becomes absorbed into $y$. If we introduce an extra mean parameter then technically our model is no longer identifiable. – dsaxton Mar 02 '16 at 14:40

1 Answers1

4

Yes! This is perfectly fine, in fact this is one major reason why we always include a constant. It turns out that if $E(\varepsilon|x) = E(\varepsilon) \neq 0$ the value gets absored into the constant. To see this suppose that $E(\varepsilon) = c$, add and subtract $c$ from both sides of the \begin{align*} y_i &= \beta_0 + x_i \beta_1 + \varepsilon_i + c - c \\ &=(\beta_0 + c) + x_i \beta_1 + (\varepsilon_i - c) \\ &=\beta_0^* + x_i \beta_1 + \varepsilon_i^* \end{align*} So now $E(\varepsilon^*|x) = 0$ and OLS estimators $\hat \beta_0^*$ and $\hat \beta_1$ are unbiased for $\beta_0^*$ and $\beta_1$.

Michael
  • 329