5

I did a regression on a train data set with 7000 observations and 50 explenatory variables with ols ridge and lasso. The lambda was chosen via cross validation. After that i wanted to compare the prediction accuracy of the 3 models by predict values of a test data set.

I thought i would get a better prediction with lasso and ridge but thats not the case.

How this could be ?What could be the possible reasons?

At the beginning i computed the vif for the ols model and had 3 variables where the vif was over 15. I thought when i have multicollinarity ridge and lasso will allways perform better ?

Dima Ku
  • 341
  • What range of possible values was your lambda selected from? And what was the value that was selected? – Ruben van Bergen Jul 16 '18 at 11:23
  • 3
    It seems something is going wrong with your cross-validation technique. One thing you could do is to set the regularisation parameter $ \lambda = 0 $ for both, because that effectively changes them to an OLS problem. If the CV accuracy is still different, there is an implementational issue. – boomkin Jul 16 '18 at 13:01
  • The Lasso sometimes has problems with highly correlated regressors. Consider the Elastic Net. Did you assess your accuracy using MSE? – Stephan Kolassa Jul 16 '18 at 13:06
  • lambda goes from 10^5 to 10^-3 with the length 100. I use the mse als accuracy measure. I thought Lasso solve the problem of correlated regressor..... – Dima Ku Jul 16 '18 at 14:57
  • 1
    Do note that while Lasso and Ridge regression tends to have lower variance than OLS regression, the cost is that they give biased estimates. Remembering that MSE = Bias^2 + Variance, we can note that which method is better depends on the size of the bias and variance. It's not a given that Lasso and Ridge has lower MSE than OLS. – Phil Jun 20 '19 at 11:58

0 Answers0