2

I am running a lasso model to predict a continuous variable, and I have continuous and categorical inputs.

  • In terms of centering and scaling - is it correct that this should only be applied to continuous variables?
  • Also, in terms of the dummy variables step, can this be ignored for the categorical? It just doesn't make sense when it comes to scoring.
    • For example, if the training set has a category called Cats, that has levels A-Z, this would create a column for A-Z. But if my scoring set is missing one of these levels then it will fall over. Looking for some guidance.
Sycorax
  • 90,934

0 Answers0