I feel like I'm stuck in a bit of a circular error here.
I have some columns with NA (still trialling whether to impute or omit) and a few categorical/factor columns too.
If I use the formula method I can run my model but then get issues with trying to predict as the factors are dummified.
train(sales~.,
data=df,
method="glmnet",
preProcess=c('center', 'scale', 'zv'),
trControl=trainControl(method="repeatedcv", number=5, repeats=2),
na.action = na.omit)
This suggests to use non-forumla method https://stackoverflow.com/a/30169022/10291291
train(
x = model.frame(formula( sales~.), df)[,-1],
y = model.frame(formula( sales~.), df)[,1],
method="glmnet",
preProcess=c('center', 'scale', 'zv'),
trControl=trainControl(method="repeatedcv", number=5, repeats=2),
na.action = na.omit)
However when I try that I get issues with the NAs and this post suggests to go back to formulas https://stackoverflow.com/a/48230658/10291291
For reference I'll likely be sticking with xgboost and glmnet
So a little lost but can't imagine this is that irregular so hoping I've perhaps missed something obvious