4

I often hear that when the residuals depart from normality, the central limit theorem can be used to fix things. I do not quite understand how this works, since the central limit theorem is a statement about scaled sums of random variables. How exactly is the CLT used to make the data normal?

Richard Hardy
  • 67,272
user321627
  • 4,474
  • 1
    You appear to have a common misconception about the central limit theorem: https://stats.stackexchange.com/questions/473455/debunking-wrong-clt-statement. – Dave Jun 01 '21 at 15:31

1 Answers1

7

The CLT does not make the data normal. For OLS the CLT is a result about the regression parameters. Indeed, they are expressed as a sum of random variables.

AdamO
  • 62,637
  • I read something about the residuals being able to be thought of as the sum of independent errors, does that make sense? – user321627 Jun 01 '21 at 07:16
  • Not really, the residuals are $Y-\hat{Y}$, so you can think of them as a sum, but they don't tend to normal as $n \rightarrow \infty$. – AdamO Jun 01 '21 at 12:59
  • I read somewhere that Gauss (the Gauss) had an interesting (but incorrect) argument as to why regression residuals should be normal. Even the giants make mistakes. – BigBendRegion Jun 01 '21 at 13:07
  • @BigBendRegion better provide a source! – AdamO Jun 01 '21 at 13:33
  • See here, bottom of p. 64. http://pzs.dstu.dp.ua/DataMining/mls/bibl/Gauss2Kalman.pdf – BigBendRegion Jun 01 '21 at 14:11
  • @BigBendRegion Thank you! Gauss wasn't wrong here. This touches on a different issue. Gauss's argument for the normality of errors was an assumption is his astronomical predictions paper, in that it was the result of possibly hundreds of unobserved variables contributing to a (seemingly) random error. Gauss didn't believe in intrinsic randomness (nor did Einstein, hence the schism between mechanical and quantum physics). OP's question seems to be an eggcorn of this; where non-normal (observed) residuals are somehow remedied by big data. – AdamO Jun 01 '21 at 15:05
  • Here is another excellent answer from the past: https://stats.stackexchange.com/questions/29731/regression-when-the-ols-residuals-are-not-normally-distributed – Ariel Jun 01 '21 at 15:50
  • @AdamO I agree that Gauss was not wrong as the OP suggests, in that a larger sample size of the existing data set somehow justifies normality. On the other hand, and maybe this is too narrow a point, but if Gauss really argued that the density was precisely, mathematically normal, then he indeed was wrong, even given the astronomical context. Yes, there were caveats about extremes, but I did not see any such caveats about the rest of the distribution. Maybe it is in the original paper. All that I would need would be the word "approximation" as regards the central portion. – BigBendRegion Jun 01 '21 at 17:22