0

I am doing lasso regression to understand the influential variables from a lists of 65 odd variables that affect the liquor consumptions of an individual.

The independent variables are combination of categorical and numeric variable like State, Education, Sex, Age, income....

Glmnet package has been used and lambda is decided based on cross validation

  fit = glmnet(x, y, alpha = 1,lambda= 0.072,thresh = 1e-12)

The lasso has given list of 25 variables with non zero coefficient and rest all 0.

The Beta values are as below

  fit$beta

State -0.350 Education -0.254 Age 0.175 Sex . ... ....

Education is a categorical variables with 5 levels - No school, High school, Graduate, Masters, Doctorate. Unlike linear regression which would give 4 beta estimates for each unique level and one will be used as reference in Lasso it gives only one Beta for Education. I am not able to interpret these beta for categorical variable(factor variable).

  • How to interpret those lasso coefficients and the signs
  • For numeric variable like Age is it to be interpreted same as in linear regression

I got some clue here Categorical variables in LASSO regression but not sure how to relate that with the beta that I got here.

joy_1379
  • 213
  • 2
  • 10
  • 3
    glmnet is not able to handle categorical variables directly, you need to convert them to dummy variables as described here – drmaettu Oct 22 '21 at 08:30
  • Is there any other package/function that creates dummy automatically like lm or glm function does? – joy_1379 Oct 22 '21 at 10:26
  • 1
    hmm I've always used glmnet so I'm not aware of such a package. But converting to dummies is really easy, just write:

    fit = glmnet(model.matrix( ~ . -1, x), y).

    What this does is it creates a design matrix without the intercept (hence the -1), which will be taken care of by gmlnet.

    – drmaettu Oct 22 '21 at 12:28
  • another option is to use makeX from the glmnet package: glmnet(makeX(x), y, ...) – schotti Jan 05 '23 at 16:23

0 Answers0