2

In the scikit-learn package, Lasso is a linear regression model while it can be used for feature selection.

However, is it reasonable if I use it directly in classification tasks for feature selection, or should I use Logistic Regression based on L1 penalty or other algorithms based on L1 penalty?

1 Answers1

0

That may or may not be a good idea. The Lasso in logistic regression will select covariates based on an optimization problem where the objective function is based on the binomial deviance. If other algorithms don't use that criteria, it may be the case that a feature which is selected out via lasso is useful in another algorithm.

  • 9
    The lasso is the same as L1 and it doesn't use a binomial variance; it uses the log likelihood function. But feature selection in general is not a very good idea (as opposed to using an L2 or ridge norm) because if you bootstrap the entire process you'll usually see a tremendous amount of randomness in which features are 'chosen'. – Frank Harrell Dec 29 '18 at 13:05
  • 2
    Feature selection is a good idea, IF identifying the relevant features is a goal of the analysis. Not such a good idea if the aim is to improve generalisation performance, because it probably will make it worse rather than better. Miller's monograph on subset selection advises to use regularisation rather than feature selection if generalisation performance is the primary goal. If identifying relevant features is the goal, you do need to investigate the stability of the selection, which as Frank Harrell suggests can be a problem. – Dikran Marsupial Dec 30 '21 at 10:37