Extended Kaplan-Meier for time-dependent covariates

Question

I have read Snapinn et al. paper on "Illustrating the Impact of a Time-Varying Covariate With an Extended Kaplan-Meier Estimator" (https://doi.org/10.1198/000313005X70371). They describe an extended Kaplan-Meier survival analysis for working with time-dependent covariates. Please reach out if you need the full text article.

I have been trying to replicate their results but I am reaching out to you to validate my results. Since they haven't made their data available, I made up some data and I already ran the following in R:

library(survival)
library(survminer)
library(tidyverse)
set.seed(99)
#arbitrary data
df1 <- data.frame(ID = rep(seq(1, 400, by = 1), 2),
                  score = factor(sample(1:4, 200, replace = TRUE)),
                  timetoFU = sample(1:200, 200, replace = TRUE),
                  status = sample(c(0, 1), 200, replace = TRUE, prob = c(0.9, 0.2))
                  )
df1 <- df1 %>% group_by(ID) %>% arrange(ID, timetoFU) %>% mutate(obs_n = row_number(), time_max = last(timetoFU)+sample(1:50, 1, replace = TRUE)) %>% ungroup()
df1$timetoFU[which(df1$obs_n == 1)] <- 0
#events more likely with higher scores
temp <- df1 %>% group_by(ID) %>% summarise(risk=sum(as.numeric(score))/10, status, time_max, obs_n) %>% ungroup() %>% mutate(status = ifelse(risk >0.4, sample(c(0,1), 100, replace = TRUE), status)) %>% mutate(status = ifelse(risk > 0.6, 1, status)) %>% filter(obs_n == 1)
#build time dependent variable data frame
td_df <- tmerge(temp, temp, id = ID, outcome = event(time_max, status))
td_df <- tmerge(td_df, df1, id = ID, td_score = tdc(timetoFU, score))
#survival analysis
s1 <- survfit(Surv(tstart, tstop, outcome) ~  td_score, data = td_df, id = ID)
ggsurvplot(s1, fun = "event", risk.table = TRUE, conf.int = TRUE, break.x.by = 10)
cox_fit_A <- coxph(formula = Surv(tstart, tstop, outcome) ~ td_score, data = td_df, id = ID)
summary(cox_fit_A)

Is this correct?

It convinces me but I learnt to be cautious when it seems too simple to be true. Two points come to my mind: 1. In the mentioned paper the authors write that the described methodology is implemented in S-Plus. I was wondering whether that was still true for recent releases of the survival package in R since the publication dates to 2012. 2. Is this the recommend way of visualizing survival estimates stratisfied by time-varying covariates? — Rootless17b, Jul 05 '20 at 15:32
You can find something similar here https://stats.stackexchange.com/questions/46754/visualize-survival-analysis-with-time-dependent-covariates and here https://stats.stackexchange.com/questions/136843/extrapolating-the-effect-of-covariable-changes-in-cox-proportional-hazards-model — , Jul 05 '20 at 15:43
Yes I have read both of those threads but I think they both only address the immortal bias issue. In a dataset with a true time varying variable, there shouldn't be any immortal bias if every subject included has at least 2 measurements for the variable. — Rootless17b, Jul 08 '20 at 13:31
Could you please explain what is the time-varying covariate in your model? The predictor in your model is "td_score", so I guess the time-varying covariate is td_score. However in the arbitrary dataset you created, td_score doesn't change through out time for the same ID. If the td-score is 1, then for both the time intervals, they are both 1. So I am confused about this part. — Becky, May 29 '21 at 03:52

score 3 · Answer 1 · answered May 30 '21 at 18:50

I won't try to evaluate whether the data generation and processing with tmerge() shown in the question is correct, but the fundamental use of survfit() with a Surv(startTime, stopTime, event) outcome variable is certainly a valid way to display Kaplan-Meier curves with time-varying covariates.

The idea in the Snapinn et al. paper is that a case should change to the corresponding stratum when its value of a time-dependent covariate changes. The counting-process Surv(startTime, stopTime, event) formalism allows for that, with properly formatted survival and covariate data and functions that allow for analysis with time-dependent covariates. That's perhaps most frequently seen in Cox models with time-dependent covariates as in the last lines of code in this question, ensuring that the regression is based on the covariate values that are current for all at-risk cases at each event time. The survfit() function, however, also handles such data.* Plots of the survfit() output thus provide what you want.

*unlike the survdiff() function, restricted only to simple right-censored survival data

Extended Kaplan-Meier for time-dependent covariates

1 Answers1

Linked