Since Lasso selects the optimal predictors to include in the model, does this suggest that we don't need to do any of the typical significance testing that comes with OLS regression and logistic regression? I am pretty used to the R output with stars by each regressor, but from talking to people, it seems like in practice, they just optimize lambda in Lasso and then just use those coefficients - and make the implicit assumption that all are significant.
1 Answers
Summarizing information provided in comments:
Lasso selects the optimal predictors to include in the model...
No. LASSO selects a set of predictors that happens to work on a particular data set. There is no assurance that they are "optimal" in any broad sense. This is particularly the case when predictors associated with outcome are correlated. See this page and the pages there noted as "Linked" and "Related" for details. Try repeating LASSO on multiple bootstrapped samples of a data set, and see how frequently the same predictors are retained in the models.
... we don't need to do any of the typical significance testing that comes with OLS regression and logistic regression
First, if you are mainly interested in prediction, then there is limited need to do significance testing. Given the risks of omitted-variable bias, there is little to be gained my omitting any predictors that might reasonably be associated with outcome unless you are at risk of overfitting the model. Just because you can't "prove" at p < 0.05 that some predictor is associated with outcome, that doesn't mean that it can't help improve predictions.
Second, with proper care and understanding of what the p-values mean, inference is possible with LASSO. See this page for an introduction to the issues and further links.
- 92,183
- 10
- 92
- 267
This is a situation where Bayesian statistics is very useful.
– JTH Nov 06 '20 at 20:29