What is the distribution of the residuals $\hat{e}_i$ in linear regression?

Question

Suppose we perform linear regression in some data and the model is correct (Y is a linear combination of X plus a normal iid error term). We know by assumption that the $e_i\sim\mathcal N(0,\sigma^2)$. But what about the residuals $\hat{e}_i=Y_i-X_i\hat b$? what is their distribution and why?

For much more about this subject search our site for posts about residual distributions — whuber, Mar 17 '22 at 15:41

score 1 · Accepted Answer · answered Mar 16 '22 at 22:28

1

The 'hat' matrix $H=X(X^TX)^{-1}X^T$ is so named because $\hat Y=HY$. This means $Y-\hat Y = (I-H)Y$, where $I$ is an identity matrix.

If you multiple an $N(0,V)$ vector by a matrix $A$ you get an $N(0,A^TVA)$ vector. Applying this with $A=I-H$ and $V=\sigma^2I$, we have $$\hat e\sim N\left(0, \sigma^2(I-H)^T(I-H)\right)$$ So, the residuals are multivariate Normally distributed, but they aren't independent and they don't all have the same variance

answered Mar 16 '22 at 22:28

Thomas Lumley

38,062

FWIW $(I-H)^T(I-H)=I-H$ – user551504 Mar 17 '22 at 02:22
will sigma^2 actually be s^2 here? – Uk rain troll Mar 02 '24 at 17:02
No, the quantity on the right-hand side of $\sim$ is the true distribution, not an estimator – Thomas Lumley Mar 03 '24 at 18:22

What is the distribution of the residuals $\hat{e}_i$ in linear regression?

1 Answers1