We're using forest-based models in a personnel selection context. For a dataset with 57 features, 230 observations, and a binary outcome, we got the following ROC curves.
This shows the first 6 folds of a 14-fold cross-validation on the dataset. To me, it looks like good performance from the model on 5 out of 6 folds. We're primarily interested in filtering out the bottom 30%, so in 5 out of 6 folds, the model was able to reduce the group by 30% without sending home qualified individuals.
How likely would this result be if we go to real data? In other words, if we would fit the model on the full dataset and use it on new data for prediction? Does anyone here know research on this problem or know a strategy to simulate it?
Note that we did interpret the model and we do not see much reason to believe that our model will not generalize to new data. We've used the interpretable SIRUS algorithm (https://github.com/rikhuijzer/StableTrees.jl).
