I am trying to better understand how changing a threshold affects a cross validation model. So if you trained a random forest model, the default threshold is threshold=0.5. And I understand that if the predicted result gets a score >0.5 it is considered a positive case and vice versa. But if you have a 5-fold cross validation model, is the model looking at what happens in the first four folds, and then looks at the threshold in order to give you the results on the test fold OR does it apply the threshold on just the test fold? Or in other words, what does the threshold change? The results of the training folds or just of the testing fold?
I guess more technically speaking, looking at the example below, it appears that the results given for each fold is for the testing fold. So does that mean the threshold is evaluated on the testing fold and the training folds make don't care about the threshold?
attach(iris)
#create a binary outcome on Sepal.Length
iris <- iris %>% mutate(Sepal.Length=ifelse(Sepal.Length>5.0,"aff","neg"))
ctrl <- trainControl(method="cv",
number=5,
summaryFunction=twoClassSummary,
classProbs=T,
savePredictions = T)
model <- train(Sepal.Length ~ ., data = iris, trControl = ctrl, method=
"rf", preProc=c("center","scale"), metric="ROC",importance=TRUE, tuneGrid =
data.frame(mtry = 2))
#examine outcome at every fold
print(model$pred)
> print(model$pred)
pred obs aff neg rowIndex mtry Resample
#1 aff neg 0.616 0.384 7 2 Fold1
#2 neg neg 0.116 0.884 10 2 Fold1
#3 aff aff 0.602 0.398 15 2 Fold1
#4 aff aff 0.894 0.106 19 2 Fold1
#5 aff neg 0.706 0.294 25 2 Fold1
#6 aff neg 0.716 0.284 27 2 Fold1
#7 neg neg 0.020 0.980 43 2 Fold1
#8 neg neg 0.034 0.966 48 2 Fold1
#9 aff aff 1.000 0.000 51 2 Fold1
#10 aff aff 1.000 0.000 60 2 Fold1
affornegin this case). So if you are modifying the threshold to get a certain value for sensitivity for your overall model which will predict on separate data, then does this mean you would want each of your training folds to have that sensitivity or you would want that trained model to have that sensitivity (i.e. average of all testing folds)? – PleaseHelp Jun 01 '20 at 22:03sensitivity=0.5and I don't know what's more appropriate: train a model and then modifymodel$predso it uses the threshold I predetermined which would then give mesen=0.5for the final model. And that means all folds are evaluated on the same threshold but it also means each "set" (train fold1-4, test fold 5, etc) has a different sensitivity ind sen that averages tosen=0.5. Or the alternative, train each set so each hassen=0.5but the overall average of test folds is unlike to besen=0.5. I think the latter undoes cv since each fold is different – PleaseHelp Jun 01 '20 at 22:16sen=0.5. But now I'm not sure if it is right to affect the threshold on the training folds or the testing folds if that makes sense? – PleaseHelp Jun 01 '20 at 22:41