I have made several models (RF, XGB and GLM) to predict a binary outcome and they all achieved an AUC of approximately 0.8 and Brier scores 0.1-0.15. Test set is fairly small (n= 350), cases with outcome are (n=50).
I am trying to create calibration plots in RStudio and I am getting results that I don't understand. At first I tried predtools, classifierplots and runway as in the example below and got results that all looked like the plot below:
Model:
RF_model <- randomForest(outcome ~ ., data = TRAIN_data)
RF_prediction$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]
and for the calibration plot (with the "probably" package):
RF_prediction %>% cal_plot_breaks (outcome, pred)
The sudden dive towards zero looked wrong to me..
I tried searching for more information and after reading the excellent(https://towardsdatascience.com/introduction-to-reliability-diagrams-for-probability-calibration-ed785b3f5d44) I realized I was probably using an incorrect data format and tried using relative frequencies instead. This created a nice S-shaped curve that looked too perfect and uniform to be believed.
Finally I found (Create calibration plot in R with vectors of predicted and observed values) on this site and ended up with the curves below after using the rms package and following syntax:
RF_model <- randomForest(outcome ~ ., data = TRAIN_data)
TEST_data$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]
plot <- val.prob(TEST_data$pred, TEST_data$outcome)
with the curve for the RF model as above:
together with the curves for the other two models:
I need help understanding the following:
- Do the syntax and the plots seem correct?
- How do I interpret the way the curves “stop” at different predicted probabilities?
- How to remove the annoying “overall” legend by the curves..?? I need to write the names of the models instead! (I managed to get rid all statistic data text with “logistic.cal = FALSE, statloc = FALSE” on the val.prob command and “flag = 0” on the plot.)
and on a more general note
- I have seen the terms reliability diagram and calibration plot used interchangeably. Are they the same thing with different names or is there some subtle difference that is lost on me..?

