I am new to coding and statistics. I wanna do classification on a data set and I wanna know, if Lasso and Logistic regression with L1 penalty are different?
-
4They are the same. – user2974951 Aug 10 '20 at 11:09
-
2But note that logistic regression is for probability estimation not classification. – Frank Harrell Aug 10 '20 at 11:17
-
2What @FrankHarrell means is that the output of the logistic regression is, say, $0.7$. What you do with that is up to you. I have a question on here with a nice answer about how to handle the probability output when there is a decision to make. Kolassa has a number of other posts on here about probabilistic outputs, too. https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email – Dave Aug 10 '20 at 11:34
1 Answers
Without specifying, “LASSO” to me means a linear model with its coefficients estimates by least squares subject to a constraint on the coefficient vector. In that sense, LASSO and $\ell_1$-penalized logistic regression do not have the same meaning.
In the context of a classification problem, however, I would take LASSO to mean a logistic regression with that same $\ell_1$ penalty on the coefficients, so the usual maximum likelihood estimation but with that added constraint/penalty. Sure, it could refer to a linear probability model estimated by least squares with the $\ell_1$ LASSO penalty or to a generalized linear model with a link function other than the logistic regression link (e.g., probit regression), but logistic regression seems to be the default generalized linear model for categorical outcomes in machine learning circles.
Note, however, that logistic regressions do not explicitly make classifications. Logistic regressions, including those whose coefficients are estimated using LASSO penalties, return predicted event probabilities. Sure, it is possible to use a threshold to determine the predicted category, but this is an additional step on top of the logistic regression. A number of good resources exist on why the explicit probabilities are useful.
Damage Caused by Classification Accuracy and Other Discontinuous Improper Accuracy Scoring Rules
Why is accuracy not the best measure for assessing classification models?
Academic reference on the drawbacks of accuracy, F1 score, sensitivity and/or specificity
- 62,186