Does Ridge regression always yield lower MSE value compared to OLS?

Question

First time asking question in StackExchange after being a long time lurker.

I am trying to analyze some simple data using R. I found the best lambda for Ridge regression using cross-validation, then I generate a ridge regression model with the best lambda with the training dataset, then calculate the mean square error (MSE) on the test dataset.

In my particular case, I found that ridge regression sometimes yields higher MSE than regular multiple linear regression (OLS) for some of the random seed that used to generate train-test split; however, the difference is not that much.

I am wondering is it possible for OLS to yield lower MSE than ridge regression? Because I think the cross-validation will generate a lambda that minimize MSE.

Because the test dataset is supposed to be independent of the training dataset (albeit similar in characteristics), there are no guarantees. — whuber, Jun 07 '22 at 20:55
This can help: https://stats.stackexchange.com/questions/487593/mean-squared-error-of-ols-smaller-than-ridge/487607#487607 — markowitz, Jun 07 '22 at 21:23

Daniel Vasilaky · Answer 1 · 2022-06-12T00:00:17.730

1

There exists a lambda such that the MSE of ridge is smaller than that of OLS.

The problem is how to find that lambda. Cross-validation is too blunt of an instrument for finding an optimal lambda. Actually, no method exists that guaranties an optimal lambda for ridge. If the design matrix is not collinear (see: https://www.mathworks.com/matlabcentral/fileexchange/60551-vif-x?s_tid=prof_contriblnk) then OLS is just as good. However, if the matrix is collinear then you should use ridge, even if your lambda is suboptimal.
Note that elastic net uses both L1 and L2 penalties. The L1, LASSO, drops the variables that are collinear unpredictably even with the same data. This would be in direct contradiction to Belsley, Kuh, and Welsch, Regression Diagnostics.

edited Jun 12 '22 at 00:00

answered Jun 07 '22 at 20:02

Daniel Vasilaky

161

Allen's PRESS statistic (i.e. leave-one-out cross-validation) seems a pretty good way of choosing the ridge parameter, especially if you perform it in "canonical form", which makes it even cheaper to compute. – Dikran Marsupial Jun 07 '22 at 21:22
4

Re "should use ridge:" When the design matrix is collinear, you should first investigate the nature of the collinearity and make variable choices based on your subject-matter knowledge and objectives. This position is described in Belsley, Kuh, and Welsch, Regression Diagnostics. When this kind of investigation is impossible (perhaps because you have no understanding of the variables or there are far too many of them), then ridge regression becomes a useful option. But then why not go all the way and use Elastic Net? – whuber Jun 10 '22 at 13:35

Does Ridge regression always yield lower MSE value compared to OLS?

1 Answers1