0

I would like to know if there is an easy way to estimate the (cumulative) baseline hazard from a cox model with time-varying coefficients over different time intervals. After creating the time splitted data with survSplit(), the predict.coxph() method with type='expected' gives expected values per row which I guess considers the effect of same subjects multiple times. Is there an easy way to obtain these estimates? Am I thinking correctly? Let's discuss it further through an example:

library(survival)

library(riskRegression)

data(Melanoma)

d <- survSplit(formula = Surv(time,status==1)~age+sex+epicel,

           data = Melanoma,

           cut = 1095,

           episode = 'tgroup',

           id = 'id')

fit <- coxph(Surv(tstart,time,event)~age:strata(tgroup)+sex+epicel,data=d,x=TRUE)

Now for example, for a patient who got censored at time=1700, if we want to estimate the baseline hazard at times 800 & 1500, the patient is in the risk set for both times but with different linear predictors (as the cut-point in time was set at time=1095). Looks like predict.coxph() doesn't take this into account. Am I thinking correctly? Is there an adjustment to predict.coxph()? Are there any other functions to do this automatically or do I need to start writing the function myself? I want to use these values to obtain absolute risk estimates of each patient. Thanks in advance.

1 Answers1

0

I managed to figure out a way to deal with it myself. survfit works with time-splitted data for time-varying coefficient if newdata argument is shaped in a time-splitted data format as well. Here we shape the information of a patient with all variables at baseline to get values of cumulative baseline hazard.

library(tidyverse)

nd <- data.frame(tstart=c(0,1095),

             time=c(1095,max(Melanoma$time)),

             event=0,

             tgroup=1:2,

             id=1,

             age=0,

             sex='Female',

             epicel='not present')

sfit <- survfit(fit,newdata=nd,id=id)

ggplot(data=tibble('time'=sfit$time,'Cumulative baseline hazard'=sfit$cumhaz)) +

geom_line(aes(x=time,y=Cumulative baseline hazard),linewidth=.75) + theme_light()

These values along with linear predictors can then be used to shape the absolute risk estimates. This vignette is a comprehensive and helpful description of time-varying covariates & coefficients. The response here is also really helpful.