I have run fixed in time Cox models (crude and covariate adjusted) for the effect of a binary exposure. Sample size is approx. 200,000 and there are approx. 10,000 events and 4000 "yes" values for the exposure.
I am concerned and confused by the very large values of the scaled Schoenfeld residuals, please see plot below from the crude model. The very high (50+ in magnitude) values appear to be from both exposed (5) and non-exposed (50) subjects.
I fitted the model using the coxph() function from the survival package in R. I used the cox.zph() function with identity transform from the same package to produce the plot shown.
UPDATES
library(survival)
library(tidyverse)
fit model
f1 <- coxph(formula = Surv(tstart, time, event) ~ exposed_bin,
data = test_df,
robust = TRUE,
cluster = id1,
x = TRUE)
check ph
ph_check <- cox.zph(f1)
show some values of the Schoenfeld resids
test_df %>% filter(event == 1) %>% arrange(time) %>%
select(id1, id2, exposed_bin, time, event) %>%
bind_cols(scho_resid = ph_check$y) %>%
slice(176:200) %>%
print(n = 25)
plot of ph_check with KM transform and no residuals displayed...




arrange(time)) and the order of Schoenfeld residuals provided bycox.zph()that you bound to the tibble. The high-magnitude residuals should all be for the very-low-prevalenceyesexposures. You show exactly 1yesexposure and 1 high-magnitude residual for each of times 67, 68, and 69, but they don't line up as they should. – EdM Jan 05 '24 at 16:47