6

There are lots of resources out there that mix up residuals with errors, using the terms interchangeably, or saying "residual errors", or not acknowledging the existence of errors at all. (One example here.) In this post on Cross Validated, one comment under the accepted answer says:

After all, normality tests are performed on residuals to gauge whether the assumption of normally distributed errors is reasonable; normality of errors will lead to normality of residuals.

My questions are:

  1. Is this so, and why do we assume this?
  2. Since, as I understand it, the point of the errors is that they are random and unknown (the "noise"), how can we assume anything about them?
  • 2
    Some people argue that normality testing is essentially useless; at least keep in mind nothing is truly normally distributed, and methods based on the normality assumption will work well for many other distributions (though not all). https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless https://stats.stackexchange.com/questions/579728/why-is-my-data-not-normally-distributed-while-i-have-an-almost-perfect-qq-plot-a/579745#579745 https://stats.stackexchange.com/questions/538561/relevance-of-assumption-of-normality-ways-to-check-and-reading-recommendations – Christian Hennig Dec 14 '23 at 11:24
  • @user2974951 Not quite. If the errors are actually normal, we don't need to perform asymptotic inference - the distribution of the $\hat{\beta}$ is exactly normal at any sample size, and the F tests are exact. On the other hand, your errors could nearly be any distribution (Bernoulli, exponential, ...) but with a large enough $n$ the inference is approximately correct because of the CLT. – AdamO Dec 14 '23 at 13:20
  • 1
    The OLS can do many things - inference on a parameter, prediction, forecasting, .... Each of these has different considerations for how the error term is distributed, and its impact on estimation and inference. What's your application? – AdamO Dec 14 '23 at 13:24