I am wondering if the loglog curve and the Cox zph test are different then what should I choose? I am working on the cgd data set. I conclude that the sex is time dependent from loglog plot but is not time dependent on zph. One reason I could think of is that the sex are of different sizes, that is, male is around 35 but female is around 168. That is a large difference. What should I do in this case?
1 Answers
For me, the most useful evaluations of the proportional hazards (PH) assumption are the smoothed plots of scaled Schoenfeld residuals over (transformed) time, which can be obtained from the object returned by the cox.zph() function.
Stratified log-log plots (in your case, based on sex) typically don't account for the associations of other covariates with outcome. An apparent (possibly lack of) proportionality might not hold if other covariates could be taken into account. (I personally can have a lot of trouble interpreting such plots, anyway.)
The text output from the cox.zph() function, on the other hand, is effectively a test of trend over time, which can miss some forms of time-varying failure of PH. That might be what is going on in your situation.
library(survival)
cox1 <- coxph(Surv(tstart, tstop, status) ~ sex + treat + inherit + steroids + cluster(id), data=cgd)
(zph1 <- cox.zph(cox1))
# chisq df p
# sex 0.122 1 0.73
# treat 0.518 1 0.47
# inherit 0.525 1 0.47
# steroids 0.137 1 0.71
# GLOBAL 1.121 4 0.89
plot(zph1[1])
That plot shows a good deal of waviness in the estimated coefficient for sex over time, even though the chi-square test shows no overall trend.
You would then need to decide if that possible lack of PH is substantial enough to matter for your application. In this particular data set, if sex wasn't of direct interest, you could deal with the PH problem by stratifying on sex in the model.
- 92,183
- 10
- 92
- 267

GLOBALrow computed fromcox.zph. – Frank Harrell Oct 02 '23 at 12:37