2

I have run fixed in time Cox models (crude and covariate adjusted) for the effect of a binary exposure. Sample size is approx. 200,000 and there are approx. 10,000 events and 4000 "yes" values for the exposure.

I am concerned and confused by the very large values of the scaled Schoenfeld residuals, please see plot below from the crude model. The very high (50+ in magnitude) values appear to be from both exposed (5) and non-exposed (50) subjects.

enter image description here

I fitted the model using the coxph() function from the survival package in R. I used the cox.zph() function with identity transform from the same package to produce the plot shown.

UPDATES

library(survival)
library(tidyverse)

fit model

f1 <- coxph(formula = Surv(tstart, time, event) ~ exposed_bin, data = test_df, robust = TRUE, cluster = id1, x = TRUE)

check ph

ph_check <- cox.zph(f1)

show some values of the Schoenfeld resids

test_df %>% filter(event == 1) %>% arrange(time) %>% select(id1, id2, exposed_bin, time, event) %>% bind_cols(scho_resid = ph_check$y) %>% slice(176:200) %>% print(n = 25)

enter image description here

enter image description here

plot of ph_check with KM transform and no residuals displayed...

enter image description here

Jeremy Miles
  • 17,812
user167591
  • 667
  • 5
  • 14
  • 1
    Did you test for violation of the proportional hazard assumption using Schoenfeld residuals first? – wjktrs Jan 05 '24 at 05:07
  • @wjktrs, yes, I did so. I have updated my question to show this. The test suggests no significant violation of proportional hazards – user167591 Jan 05 '24 at 10:49
  • 1
    I suspect that the large number of tied event times is leading to differences in the row order between your tibble (set via arrange(time)) and the order of Schoenfeld residuals provided by cox.zph() that you bound to the tibble. The high-magnitude residuals should all be for the very-low-prevalence yes exposures. You show exactly 1 yes exposure and 1 high-magnitude residual for each of times 67, 68, and 69, but they don't line up as they should. – EdM Jan 05 '24 at 16:47

1 Answers1

2

I am concerned and confused by the very large values of the scaled Schoenfeld residuals...(emphasis added)

The scaling of Schoenfeld residuals is proportional to the number of events in the dataset. See this page, for example, or this snippet of code from the residuals.coxph() function for the case of scaled Schoenfeld residuals:

if (otype == "scaledsch") {
            ndead <- sum(deaths)
            coef <- ifelse(is.na(object$coefficients), 0, object$coefficients)
            if (nvar == 1) 
                rr <- rr * vv * ndead + coef
            else {
                cname <- colnames(rr)
                rr <- drop(rr %*% vv * ndead + rep(coef, each = nrow(rr)))
                colnames(rr) <- cname
            }
        }

Here, the initial value of rr is the set of unscaled Schoenfeld residuals, and vv is the (non-robust) variance-covariance matrix of the coefficient estimates.

With a large number of deaths, as you have, the absolute values of scaled Schoenfeld residuals can be quite large. What's important is the overall shape of the smoothed plot over time.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for your help @EdM! I have added a few updates. I am still a bit unclear why there seems to be a massive jump in magnitude of the residuals with no clear pattern (please see far right column in my updates to the question). I am not quite sure if I did this correct, but I tried to align the values of the Schoenfeld residuals from the cox.zph object to my data, after filtering for just the events (=1) and then arranging by event times. I'm not sure if the cox.zph is still valid with a cluster robust option which I have used? – user167591 Jan 05 '24 at 10:55
  • 1
    @user167591 cox.zph() extracts the non-robust covariance matrix from the model even if you specify a robust model. So it's valid with robust models. As my new comment on the question indicates, I think there's a discrepancy between how you are ordering cases within a single time value and how cox.zph() is ordering the same cases. Within each time value, I suspect that you will find the number of high-magnitude residuals to be the same as the number of yes exposure values for the time value. – EdM Jan 05 '24 at 16:53