Creating and interpreting calibration plots for several models with a binary outcome

Question

I have made several models (RF, XGB and GLM) to predict a binary outcome and they all achieved an AUC of approximately 0.8 and Brier scores 0.1-0.15. Test set is fairly small (n= 350), cases with outcome are (n=50).

I am trying to create calibration plots in RStudio and I am getting results that I don't understand. At first I tried predtools, classifierplots and runway as in the example below and got results that all looked like the plot below:

Model:

RF_model <- randomForest(outcome ~ ., data = TRAIN_data)
RF_prediction$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]

and for the calibration plot (with the "probably" package):


RF_prediction %>% cal_plot_breaks (outcome, pred)

The sudden dive towards zero looked wrong to me..

I tried searching for more information and after reading the excellent(https://towardsdatascience.com/introduction-to-reliability-diagrams-for-probability-calibration-ed785b3f5d44) I realized I was probably using an incorrect data format and tried using relative frequencies instead. This created a nice S-shaped curve that looked too perfect and uniform to be believed.

Finally I found (Create calibration plot in R with vectors of predicted and observed values) on this site and ended up with the curves below after using the rms package and following syntax:

RF_model <- randomForest(outcome ~ ., data = TRAIN_data)
TEST_data$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]
plot <- val.prob(TEST_data$pred, TEST_data$outcome)

with the curve for the RF model as above: together with the curves for the other two models: I need help understanding the following:

Do the syntax and the plots seem correct?
How do I interpret the way the curves “stop” at different predicted probabilities?
How to remove the annoying “overall” legend by the curves..?? I need to write the names of the models instead! (I managed to get rid all statistic data text with “logistic.cal = FALSE, statloc = FALSE” on the val.prob command and “flag = 0” on the plot.)

and on a more general note

I have seen the terms reliability diagram and calibration plot used interchangeably. Are they the same thing with different names or is there some subtle difference that is lost on me..?

I could not reproduce your plots with the code you posted. Please provide a minimal working example which reproduces your last two figures, so it is easier to find a solution — picky_porpoise, Aug 13 '23 at 10:47

Creating and interpreting calibration plots for several models with a binary outcome

0 Answers0