13

Examples of this page show that simple regression is markedly affected by outliers and this can be overcome by techniques of robust regression: http://www.alastairsanderson.com/R/tutorials/robust-regression-in-R/ . I believe lmrob and ltsReg are other robust regression techniques.

Why should one not do robust regression (like rlm or rq) every time rather than performing simple regression (lm)? Are there any drawbacks of these robust regression techniques? Thanks for your insight.

rnso
  • 10,009

1 Answers1

7

The Gauss-Markov theorem:

In a linear model with spherical errors (which along the way includes an assumption of no outliers, via a finite error variance), OLS is efficient in a class of linear unbiased estimators - there are (restrictive, to be sure) conditions under which "you can't do better than OLS".

  • So if there are no outliers,linear regression would be the best. But if there are, or if other assumptions are being violated, then only one should perform robust regressions. Is that correct? – rnso Apr 13 '15 at 12:29
  • 2
    If there are outliers, other techniques are better, yes. I would not jump to the conclusion that "if other assumptions are being violated, then [...] one should perform robust regressions" - it is not a cure-all for all violations. F.x., when errors are correlated with regressors and you are after causal effects, instrumental variables techniques are called for. – Christoph Hanck Apr 13 '15 at 13:52
  • The Gauss-Markov theorem does not make a normality assumption. We still get the BLUE if the error is heavy-tailed. – Dave Oct 14 '20 at 14:21
  • Correct about normality; nor do I say it does. It does make an assumption of finite error variances, though, so does that square with "heavy tails" to you? – Christoph Hanck Oct 14 '20 at 15:10
  • However Gauss-Markov considers only linear estimators. Robust estimators are nonlinear. – dave fournier Dec 04 '23 at 21:36
  • Right, but see e.g. https://stats.stackexchange.com/questions/153348/other-unbiased-estimators-than-the-blue-ols-solution-for-linear-models/153356#153356, which also suggests, at least in that setting, superiority over nonlinear estimators in certain scenarios. – Christoph Hanck Dec 05 '23 at 06:26
  • For some estimation schemes the a posteriori variances of the observations are not all equal so that argument would not hold. – dave fournier Dec 05 '23 at 20:40
  • Perhaps the discussion in the link could be used to argue that one should use estimators for which the a posteriori variances of the observations are not constant, rather than for showing that the the standard estimates are BLUE. – dave fournier Dec 06 '23 at 17:55
  • I must admit that I am not sure I understand what the concept of a posteriori variances of observations precisely entails and how that relates to different estimators. – Christoph Hanck Dec 07 '23 at 05:48
  • Well consider the trimmed mean as a thought experiment. After the samples have been taken (a posteriori) one decides to remove the biggest outliers. That is like assuming that the variance of these observations is infinite or just so large that they can be ignored. Now one wants to make this idea more precise. This can be done by assuming that the residuals are a mixture of a normal distribution and some fat tailed distribution. Then it turns out that one has to play around a bit with the esitmators for the variances of the normal and the fat-tailed distribution but it can be done. – dave fournier Dec 07 '23 at 16:45