4

Generally, is it useful to carry out statistical calibration tests on purely predictive models? For instance, if I build predictive model and I choose final model relying on cross validation results (Akaike, AUC, etc.), what additional info can the Hosmer-Lemeshow test give me? Assuming that I don't care about interpretation of model parameters, my only goal is the highest possible result on validation and test sets. My problem is that I have a small amount of training and testing data, but I want a model that generalizes well to real world.

To narrow the question: Can the results of Hosmer-Lemeshow tests be used to compare predictive models? Is it reasonable thing to do?

Dave
  • 62,186
mokebe
  • 273
  • 4
  • 12

1 Answers1

0

Hosmer-Lemeshow in particular assesses the calibration of the outputs: if a predicted probability of $p$ corresponds to an event occurrence probability of $p$, too. This is a desirable property, but there is more to the story than prediction calibration, and it is entirely possible than a model can have somewhat worse calibration yet considerably better discrimination (separation of predictions between the two groups) that lead to a better score on one of the classic metrics like Brier score or log loss. Thus, the Hosmer-Lemeshow test seems to be of little use in comparing model performance.

Further, Harrell argues that the Hosmer-Lemeshow test is obsolete in its role as assessing calibration.

Dave
  • 62,186