I want to check the proportional hazard assumption. I used both test using cox.zph() in R and schoenfeld residuals plot using hggcoxzph(). I want to know if the plot is fine and how I can interpret it. From the test p-value is not significant which is good, but how about the schoenfeld residuals plot?
- 2,611
- 129
1 Answers
The proportional hazards (PH) assumption might be OK, but the plot suggests that you need to think carefully about your model. You will have to use your knowledge of the subject matter to make your decisions.
First, do NOT depend on ggcoxzph(). Its plot has extremely wide y-axis limits and improperly drawn confidence limits for the smoothed curve. This probably is related to a long-standing (and evidently still uncorrected) coding error described here. Furthermore, it seems to have cut off values of time beyond about 79. The plot produced by the survival package tools shows that many of your data points are beyond there.
Second, the time transformation used by cox.zph() (note the non-linear spacing of tick marks along the Time axis) has pushed together those late-time events so that most of the plot emphasizes the time range from 71 to about 80. The default Kaplan-Meier time transformation helps minimize contributions from outliers in the usual clinical setting, where there are usually very few and widely spaced event times at late times. In your case, a high proportion of events seems to be at late times. A different time transformation (e.g., identity) might have given a "significant" departure from PH. There does seem to be a dip in the smoothed curve at the later times. Based on your understanding of the subject matter, you and your colleagues have to decide whether the default time transformation is appropriate for your data and whether that apparent dip at late times is big enough to matter. It also seems that there are many tied event times, so make sure that your choice of how to handle tied times was OK for such data.
You also need to consider the magnitudes of the scaled Schoenfeld residuals. They are hard to interpret without further information about the model, as they start with the differences between the covariate values for an individual having an event and the risk-weighted average of the corresponding covariate values for those at risk at that time, then scale the differences by the coefficient covariance matrix and the number of events. See this page. Large-magnitude scaled residuals could be something as simple as a large number of events. See this page. It's best to focus on the shape of the smoothed residual plot instead of on the individual scaled residuals.
- 92,183
- 10
- 92
- 267
-
thanks for explanation. The ties is I guess because in coxph(Surv)) I have added ties = "breslow". – user358238 Jun 28 '22 at 13:22
-
@user358238 With a lot of ties you are generally better off using the Efron correction, which is the default in
coxph()(although not in some other software). – EdM Jun 28 '22 at 13:37


survivalpackage. You should be OK, given the fairly high p-value forcox.zph(). – EdM Jun 27 '22 at 13:16ggplotcoxzph()makes the y-axis too wide relative to the actual points and the smoothed curve. Try repeating with the standardplot.cox.zph()function in the survival package, or modify the code as in my answer to the linked question. – EdM Jun 27 '22 at 17:01plot.cox.zph()is the internal name. If you callplot()on an object returned by thecox.zph()function, the software knows to use the function with that internal name instead of all the other possibleplot()functions. When thesurvivalpackage is loaded you can find the manual page by typing?plot.cox.zphat the command prompt, but you just callplot()yourself when you want to generate the scaled Schoenfeld residual plot. See Section 3.1 of the survival vignette. – EdM Jun 27 '22 at 21:21