3

It seems that minimizing the sum of squared residuals (SSR) in linear regression is equivalent to minimizing MSE (both use true value - prediction) and OLS is the best estimator for minimizing SSR.

I also read that least squares can sometimes produce estimators with large variance under multicollinearity, in which case a biased estimator might produce a better MSE.

I am a bit confused why OLS is the best for SSR but sometimes is not the best on MSE, as these 2 metrics seemingly are proportional to each other.

Thanks.

user34829
  • 330
  • 1
  • 3
  • 11

2 Answers2

1

You solve for the OLS coefficients by finding the coefficients that minimize the in-sample square loss (expressed as SSR or (R)MSE), so in-sample, nothing beats OLS.

When people say that biased estimators like the ridge regression estimator can produce better MSE, they mean on out-of-sample data. Fitting well in-sample might translate to fitting well out-of-sample, but it does not have to.

For instance, you could just play connect-the-dots with a scatterplot. Such a model perfectly predicts the connected dots. However, it has fit to the noise and will highly dependent on coincidences in the data. By regression to the mean, there is a sense in which, on new data, high points are likely to be lower and low points and likely to be higher. Your connect-the-dots mod with zero in-sample MSE will have poor performance if that happens.

A biased estimator combats this by sacrificing some in-sample fit in exchange for the possibility (but not assurance) of getting a better out-of-sample fit.

(Somewhat unrelated, you are correct to notice that SSR, MSE, and RMSE are equivalent in a sense. In a model comparison, a model outperforming another mode on one will outperform the competing model on the other two.)

Dave
  • 62,186
0

This is because OLS only gives the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. It is only the best among linear, unbiased estimators. It doesn't mean that there can't be biased estimators that perform better than OLS.

  • Keep in mind that, in-sample, nothing beats OLS, and that is a consequence of (multivariable) calculus, not Gauss-Markov. In fact, OLS is unbeatable in-sample, even if the Gauss-Markov assumptions are grossly violated. – Dave Dec 10 '21 at 13:35