3

I have read Snapinn et al. paper on "Illustrating the Impact of a Time-Varying Covariate With an Extended Kaplan-Meier Estimator" (https://doi.org/10.1198/000313005X70371). They describe an extended Kaplan-Meier survival analysis for working with time-dependent covariates. Please reach out if you need the full text article.

Figure from Snapinn paper

I have been trying to replicate their results but I am reaching out to you to validate my results. Since they haven't made their data available, I made up some data and I already ran the following in R:

library(survival)
library(survminer)
library(tidyverse)

set.seed(99)

#arbitrary data df1 <- data.frame(ID = rep(seq(1, 400, by = 1), 2), score = factor(sample(1:4, 200, replace = TRUE)), timetoFU = sample(1:200, 200, replace = TRUE), status = sample(c(0, 1), 200, replace = TRUE, prob = c(0.9, 0.2)) )

df1 <- df1 %>% group_by(ID) %>% arrange(ID, timetoFU) %>% mutate(obs_n = row_number(), time_max = last(timetoFU)+sample(1:50, 1, replace = TRUE)) %>% ungroup() df1$timetoFU[which(df1$obs_n == 1)] <- 0 #events more likely with higher scores temp <- df1 %>% group_by(ID) %>% summarise(risk=sum(as.numeric(score))/10, status, time_max, obs_n) %>% ungroup() %>% mutate(status = ifelse(risk >0.4, sample(c(0,1), 100, replace = TRUE), status)) %>% mutate(status = ifelse(risk > 0.6, 1, status)) %>% filter(obs_n == 1)

#build time dependent variable data frame td_df <- tmerge(temp, temp, id = ID, outcome = event(time_max, status)) td_df <- tmerge(td_df, df1, id = ID, td_score = tdc(timetoFU, score))

#survival analysis

s1 <- survfit(Surv(tstart, tstop, outcome) ~ td_score, data = td_df, id = ID) ggsurvplot(s1, fun = "event", risk.table = TRUE, conf.int = TRUE, break.x.by = 10)

cox_fit_A <- coxph(formula = Surv(tstart, tstop, outcome) ~ td_score, data = td_df, id = ID) summary(cox_fit_A)

Replication

Is this correct?

  • It looks right. Which part does not convince you? –  Jul 04 '20 at 10:41
  • 1
    It convinces me but I learnt to be cautious when it seems too simple to be true. Two points come to my mind: 1. In the mentioned paper the authors write that the described methodology is implemented in S-Plus. I was wondering whether that was still true for recent releases of the survival package in R since the publication dates to 2012. 2. Is this the recommend way of visualizing survival estimates stratisfied by time-varying covariates? – Rootless17b Jul 05 '20 at 15:32
  • You can find something similar here https://stats.stackexchange.com/questions/46754/visualize-survival-analysis-with-time-dependent-covariates and here https://stats.stackexchange.com/questions/136843/extrapolating-the-effect-of-covariable-changes-in-cox-proportional-hazards-model –  Jul 05 '20 at 15:43
  • Yes I have read both of those threads but I think they both only address the immortal bias issue. In a dataset with a true time varying variable, there shouldn't be any immortal bias if every subject included has at least 2 measurements for the variable. – Rootless17b Jul 08 '20 at 13:31
  • Could you please explain what is the time-varying covariate in your model? The predictor in your model is "td_score", so I guess the time-varying covariate is td_score. However in the arbitrary dataset you created, td_score doesn't change through out time for the same ID. If the td-score is 1, then for both the time intervals, they are both 1. So I am confused about this part. – Becky May 29 '21 at 03:52

1 Answers1

3

I won't try to evaluate whether the data generation and processing with tmerge() shown in the question is correct, but the fundamental use of survfit() with a Surv(startTime, stopTime, event) outcome variable is certainly a valid way to display Kaplan-Meier curves with time-varying covariates.

The idea in the Snapinn et al. paper is that a case should change to the corresponding stratum when its value of a time-dependent covariate changes. The counting-process Surv(startTime, stopTime, event) formalism allows for that, with properly formatted survival and covariate data and functions that allow for analysis with time-dependent covariates. That's perhaps most frequently seen in Cox models with time-dependent covariates as in the last lines of code in this question, ensuring that the regression is based on the covariate values that are current for all at-risk cases at each event time. The survfit() function, however, also handles such data.* Plots of the survfit() output thus provide what you want.


*unlike the survdiff() function, restricted only to simple right-censored survival data

EdM
  • 92,183
  • 10
  • 92
  • 267