Suppose we perform linear regression in some data and the model is correct (Y is a linear combination of X plus a normal iid error term). We know by assumption that the $e_i\sim\mathcal N(0,\sigma^2)$. But what about the residuals $\hat{e}_i=Y_i-X_i\hat b$? what is their distribution and why?
Asked
Active
Viewed 956 times
1
-
1For much more about this subject search our site for posts about residual distributions – whuber Mar 17 '22 at 15:41
1 Answers
1
The 'hat' matrix $H=X(X^TX)^{-1}X^T$ is so named because $\hat Y=HY$. This means $Y-\hat Y = (I-H)Y$, where $I$ is an identity matrix.
If you multiple an $N(0,V)$ vector by a matrix $A$ you get an $N(0,A^TVA)$ vector. Applying this with $A=I-H$ and $V=\sigma^2I$, we have $$\hat e\sim N\left(0, \sigma^2(I-H)^T(I-H)\right)$$ So, the residuals are multivariate Normally distributed, but they aren't independent and they don't all have the same variance
Thomas Lumley
- 38,062
-
-
-
No, the quantity on the right-hand side of $\sim$ is the true distribution, not an estimator – Thomas Lumley Mar 03 '24 at 18:22