5

I see different answers everywhere. Intuitively, I would think if residuals are autocorrelated then there is some information that you are not incorporating into your model and is a sign of a biased fit. However, I see sources saying it is not biased, but not the "best" estimate.

What is the answer and why?

mkt
  • 18,245
  • 11
  • 73
  • 172
  • 2
    See https://stats.stackexchange.com/search?q=ols+bias++-omitted+-lag+-time, especially the answer at https://stats.stackexchange.com/a/559280/919. "Gauss Markov Theorem" is also a good search key. – whuber Mar 18 '24 at 19:03
  • 1
    As to why general least square is more efficient than OLS, see stats.stackexchange.com/a/496131/77222 – Jarle Tufto Mar 18 '24 at 22:09
  • 1
    Is "residual" in econometrics the same as "error" in statistics? I have to say I hate to use "residual" to mean the $\varepsilon$ in $y = X\beta + \varepsilon$ because it means something else (specifically $\hat{\varepsilon} = y - \hat{y}$) in the stat community. – Zhanxiong Mar 19 '24 at 04:37

3 Answers3

11

Residual autocorrelation does not cause bias, but it monkeys with the variance

Because bias is only focused on the expected value of an estimator, residual autocorrelation actually does not cause bias, but it does monkey with the variance of the estimator. This means that it still affects the properties of the estimator and can lead to under or over-estimation of the estimator variance, which can impact confidence and prediction intervals. To see this, note that for the linear model $\mathbf{Y} = \mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ the deviation of the OLS estimator from the true coefficient vector can be written as a linear transofmation of the error vector:

$$\begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{Y} \\[6pt] &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} (\mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt] &= \boldsymbol{\beta} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \boldsymbol{\varepsilon}. \\[6pt] \end{align}$$

If we denote the error mean as $\boldsymbol{\mu}_\mathbf{x} \equiv \mathbb{E}(\boldsymbol{\varepsilon} | \mathbf{x})$ and the error variance as $\boldsymbol{\Sigma}_\mathbf{x} \equiv \mathbb{V}(\boldsymbol{\varepsilon} | \mathbf{x})$ then we get the corresponding estimator moments:

$$\begin{align} \mathbb{E}(\hat{\boldsymbol{\beta}} | \mathbf{x}) &= \boldsymbol{\beta} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \boldsymbol{\mu}_\mathbf{x}, \\[6pt] \mathbb{V}(\hat{\boldsymbol{\beta}} | \mathbf{x}) &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{x}^\text{T} \boldsymbol{\Sigma}_\mathbf{x} \mathbf{x}) (\mathbf{x}^\text{T} \mathbf{x})^{-1}. \\[6pt] \end{align}$$

The linearity assumption in the model is that $\mathbb{E}(\boldsymbol{\varepsilon} | \mathbf{x}) = 0$. So long as this assumption holds, you then have $\mathbb{E}(\hat{\boldsymbol{\beta}} | \mathbf{x}) = \boldsymbol{\beta}$ so the OLS estimator is unbiased. However, if the error terms are equivariant with non-zero autocorrelation then we get a non-diagonal variance matrix of the form:

$$\begin{align} \boldsymbol{\Sigma}_\mathbf{x} &= \sigma^2 \begin{bmatrix} 1 & \rho_{1} & \rho_{2} & \cdots & \rho_{n-3} & \rho_{n-2} & \rho_{n-1} \\ \rho_{1} & 1 & \rho_{1} & \cdots & \rho_{n-4} & \rho_{n-3} & \rho_{n-2} \\ \rho_{2} & \rho_{1} & 1 & \cdots & \rho_{n-5} & \rho_{n-4} & \rho_{n-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ \rho_{n-3} & \rho_{n-4} & \rho_{n-5} & \cdots & 1 & \rho_{1} & \rho_{2} \\ \rho_{n-2} & \rho_{n-3} & \rho_{n-4} & \cdots & \rho_{1} & 1 & \rho_{1} \\ \rho_{n-1} & \rho_{n-2} & \rho_{n-3} & \cdots & \rho_{2} & \rho_{1} & 1 \\ \end{bmatrix}. \end{align}$$

which leads to a complicated leading term $(\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{x}^\text{T} \boldsymbol{\Sigma}_\mathbf{x} \mathbf{x})$ in the variance expression. If there is no autocorrelation then this term reduces down to a constant multiple of the identity matrix which gives the standard expression for the variance of the OLS estimator.

The result of all this is that the presence of autocorrelation in the error terms affects the variance of the OLS estimator (but not its expected value, so it is still unbiased under the linearity assumption). This has flow-on effects if we want to use the OLS estimator to obtain confidence intervals for the true coefficients in the regression or to obtain prediction intervals for the response variable for one or more new observations. Generally speaking, to do these things we would have to estimate the autocorrelation using some stipulated form using WLS estimation.

Ben
  • 124,856
3

No, OLS won't be asymptotically biased (for large enough sample sizes) this is due to the fact that minimizing the mean squared error is giving you the estimate of a conditional mean (just following the standard argument).

However, with correlated measurements you might be able to come up with an estimator which exploits this correlation structure to provide an estimator with lower variance, i.e. be more sample efficient and with in tendency tighter confidence intervals... You might be able to do this by adding more distributional assumptions when driving the maximum likelihood "loss function", compare feasible least squares estimator

Ggjj11
  • 1,237
1

It's a spurious regression. You'll just have high T-statistics. But the actual estimate doesn't account for the autocorrelation. And you have a misspecified model. It's not biased because X is not correlated with $\epsilon$.

  • 2
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Community Mar 18 '24 at 20:29