1

I found although the Gauss-Markov Theorms are so widely used, it has so many different versions. Appreciate it if anyone could help me clarify this specific question I have.

Given the OLS estimators: $$ \begin{align*} \mathbb{E}(\hat{\vec{\beta}})&= \vec{\beta} + \mathbb{E}[(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot\vec{\epsilon}] \\ \mathbb{E}(\hat{\vec{\epsilon}}) &= \vec{\epsilon} - \mathbb{E}[\mathbf{X} \cdot(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot \vec{\epsilon}]\\ \end{align*} $$

In the most classical model, when treating $\mathbf{X}$ as fixed, we can proved that $\mathbb{E}(\vec{\epsilon}) = \vec{0}$ guarantee that the estimators are unbiased as terms include $\mathbf{X}$ are merely linear operators.

However, when treating both $\mathbf{X}$ and $\vec{\epsilon}$ as random. What condition will still implies the estimators are unbiased?

A common one would be $\mathbb{E}_{\vec{\epsilon}|X}(\vec{\epsilon}|\mathbf{X}) = \vec{0}$, as using iterative conditional probability:

$$ \begin{align*} \mathbb{E}(\hat{\vec{\beta}})&=\mathbb{E}(\vec{\beta}) + \mathbb{E}[(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot\vec{\epsilon}] \\ &= \vec{\beta} + \mathbb{E}_{X}[(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot\mathbb{E}_{\vec{\epsilon}|X}(\vec{\epsilon}|\mathbf{X})] \end{align*} $$

$$\begin{align*} \mathbb{E}(\hat{\vec{\epsilon}}) &= \vec{\epsilon} - \mathbb{E}[\mathbf{X} \cdot(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot \vec{\epsilon}]\\ &= \vec{\epsilon} - \mathbb{E}_{X}[\mathbf{X} \cdot(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot \mathbb{E}_{\vec{\epsilon}|X}(\vec{\epsilon}|\mathbf{X})] \end{align*} $$

Clearly, $\mathbb{E}_{\vec{\epsilon}|X}(\vec{\epsilon}|\mathbf{X}) = \vec{0}$ will result in unbiased estimators. But this condition is sufficient but not necessary, as the inner expectation can be non-zero but cancel out in the outer expectation.

So I am trying to figure out what a necessary and sufficient assumption do we need to make the estimators unbiased. I saw some where also provided $\mathbb{E}(\mathbf{X}^T \cdot \vec{\epsilon}) = \vec{0}$, but as both $\vec{\epsilon}$ and $\mathbf{X}$ is random, and we do not know their joint distribution, we cannot take out $(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T$ and $\mathbf{X} \cdot(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T$ or split the expectation to two parts.

In this context, is $\mathbb{E}(\mathbf{X}^T \cdot \vec{\epsilon}) = \vec{0}$ a sufficient condition for unbiasness? And if so, how can we derive it from the expectation? Further, is it and can we prove if it is necessary?

My intuition is that from a linear algebra perspective, as $(\mathbf{X}^T \cdot \mathbf{X})^{-1} \cdot \mathbf{X}^T \cdot\vec{\epsilon}$ represent the projection of random vector $\vec{\epsilon}$ onto the hyperplane spanned by random column vectors of $\mathbf{X}$ using the random columns as new coordinates. For the expectation to be a zero vector would require the random vector $\vec{\epsilon}$ to be orthogonal to the random hyperplane spanned by columns of $\mathbf{X}$ or to be the origin $\vec{0}$ itself on average.

But this intuitive answer does not come from the law of expectation so I am feeling a little unsettling and hope if anyone can clarify.

THX!

Kay99
  • 13
  • $E(X'\epsilon)=0$ is not sufficient for unbiasedness, but for consistency, see e.g. https://stats.stackexchange.com/questions/240383/why-is-ols-estimator-of-ar1-coefficient-biased – Christoph Hanck Sep 14 '23 at 06:34

0 Answers0