I'm using the caret package to train a few models via rpart and ranger packages.
The problem is, when I use the target metric of PRAUC, the code returns a warning message saying:
# Warning message:
# In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
# There were missing values in resampled performance measures.
For reproducibility, I provide the code and data using rpart only. The simplest two versions of the data are on my github account here -- they have the same dependent variables but use two different predictors, and the problem persists in both versions.
The code that I run is:
library(caret)
# Download df_1.Rda, df_2.Rda from "https://github.com/hyk0127/ML_ask/tree/master/data"
load("df_1.Rda") # alternatively, load("df_2.Rda")
df_set <- obj
# Define function replacing caret::prSummary to pass a specific 'lev' argument to trainControl
custom_prsummary <- function (data, lev = NULL, model){
lev <- c("Yes", "No")
pr_auc <- MLmetrics::PRAUC(y_pred = data[, lev[1]],
y_true = ifelse(data$obs == lev[1], 1, 0))
out <-
c(PRAUC = pr_auc,
Precision = caret::precision(data = data$pred,
reference = data$obs, relevant = lev[1]),
Recall = caret::recall(data = data$pred, reference = data$obs, relevant = lev[1]),
F = caret::F_meas(data = data$pred, reference = data$obs, relevant = lev[1]))
return(out)
}
tc <- trainControl(
method = "cv",
number = 10,
summaryFunction = custom_prsummary,
allowParallel = TRUE,
classProbs = TRUE,
savePredictions = "final"
)
out_rpart <- train(
as.factor(depvar) ~ .,
metric = "PRAUC",
method = "rpart",
trControl = tc,
data = df_set$train
)
I've checked similar questions on this thread and this thread and a few others, but they do not apply to my case. out$resample does not have any NA values, as well as the data itself.
I'd really appreciate any insights/solution to the problem. Thank you!