1

I have made several models (RF, XGB and GLM) to predict a binary outcome and they all achieved an AUC of approximately 0.8 and Brier scores 0.1-0.15. Test set is fairly small (n= 350), cases with outcome are (n=50).

I am trying to create calibration plots in RStudio and I am getting results that I don't understand. At first I tried predtools, classifierplots and runway as in the example below and got results that all looked like the plot below:

Model:

RF_model <- randomForest(outcome ~ ., data = TRAIN_data)

RF_prediction$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]

and for the calibration plot (with the "probably" package):


RF_prediction %>% cal_plot_breaks (outcome, pred)

enter image description here

The sudden dive towards zero looked wrong to me..

I tried searching for more information and after reading the excellent(https://towardsdatascience.com/introduction-to-reliability-diagrams-for-probability-calibration-ed785b3f5d44) I realized I was probably using an incorrect data format and tried using relative frequencies instead. This created a nice S-shaped curve that looked too perfect and uniform to be believed.

enter image description here

Finally I found (Create calibration plot in R with vectors of predicted and observed values) on this site and ended up with the curves below after using the rms package and following syntax:

RF_model <- randomForest(outcome ~ ., data = TRAIN_data)

TEST_data$pred <- predict(RF_model, TEST_data, type = "prob")[,"no"]

plot <- val.prob(TEST_data$pred, TEST_data$outcome)

with the curve for the RF model as above: enter image description here together with the curves for the other two models: enter image description here I need help understanding the following:

  1. Do the syntax and the plots seem correct?
  2. How do I interpret the way the curves “stop” at different predicted probabilities?
  3. How to remove the annoying “overall” legend by the curves..?? I need to write the names of the models instead! (I managed to get rid all statistic data text with “logistic.cal = FALSE, statloc = FALSE” on the val.prob command and “flag = 0” on the plot.)

and on a more general note

  1. I have seen the terms reliability diagram and calibration plot used interchangeably. Are they the same thing with different names or is there some subtle difference that is lost on me..?
Nick Cox
  • 56,404
  • 8
  • 127
  • 185
mmo
  • 11
  • I could not reproduce your plots with the code you posted. Please provide a minimal working example which reproduces your last two figures, so it is easier to find a solution – picky_porpoise Aug 13 '23 at 10:47

0 Answers0