1

I am considering showing how mis-calibrated a cox proportional hazard model is by plotting the 10th percentiles of risk on the x axis vs the incidence per 100,000. For each bin in x I could plot data points for both the predicted incidence and the observed incidence to compare their incidence per 100,000. However it seems in the literature it is more common to plot percentage at risk (or percentage that have experience the event) so that you get a nice 45 degree angle (the ideal model) to compare against.

Which would be the better option?

1 Answers1

1

A survival model evaluates the probability of an event as a function of time. A natural (and, as you note, frequent) way to evaluate calibration is with a plot of observed versus predicted survival probability at a specified time. An overall "incidence" plot might be appropriate for a binomial regression, but evaluation without a specified reference time might be misleading with a survival model based on censored event times.

You can extend your test of calibration and make it more generally useful by modeling on multiple bootstrap samples of the data. You also can get a smoother estimate of the calibration curve, without binning into deciles, by using established ways to interpolate the "observed" (even if censored) event probabilities as a function of predictor values. That's implemented in the calibrate() function in Frank Harrell's rms package in R, providing an optimism-adjusted calibration curve that takes potential overfitting into account. See this page for an outline of the approach.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Hi thanks for your response. I have been trying out calibrate() from rms and it I like it. If I used cumulative incidence up to timepoint x and plotted incidence (y) and survival probability on the x axis (and estimating via optimism adjustment) would that be preferable over plotting against observed survival? – brucezepplin Mar 10 '24 at 21:29
  • @brucezepplin the problem with using observed incidence is that is doesn't readily take censoring into account. The methods used to estimate "observed" survival in calibrate() do. – EdM Mar 19 '24 at 20:35