0

I feel like I'm stuck in a bit of a circular error here.

I have some columns with NA (still trialling whether to impute or omit) and a few categorical/factor columns too.

If I use the formula method I can run my model but then get issues with trying to predict as the factors are dummified.

train(sales~., 
      data=df,
      method="glmnet", 
      preProcess=c('center', 'scale', 'zv'), 
      trControl=trainControl(method="repeatedcv", number=5, repeats=2), 
      na.action = na.omit)

This suggests to use non-forumla method https://stackoverflow.com/a/30169022/10291291

train(
  x = model.frame(formula( sales~.), df)[,-1],
  y = model.frame(formula( sales~.), df)[,1],
  method="glmnet", 
  preProcess=c('center', 'scale', 'zv'), 
  trControl=trainControl(method="repeatedcv", number=5, repeats=2), 
  na.action = na.omit)

However when I try that I get issues with the NAs and this post suggests to go back to formulas https://stackoverflow.com/a/48230658/10291291

For reference I'll likely be sticking with xgboost and glmnet

So a little lost but can't imagine this is that irregular so hoping I've perhaps missed something obvious

Quixotic22
  • 2,399
  • 1
  • 4
  • 13

0 Answers0