Step function in Cox Model

Question

I have a stratified Cox model with a covariate which violates proportional hazard assumptions:

data(veteran)
toy <- coxph(Surv(time, status) ~ trt + prior + karno, data=veteran)

So I perform a step function:

toy1 <- survSplit(Surv(time, status) ~ ., data=veteran,cut=c(90, 180),episode="t_group", id='id')
toy <- coxph(Surv(time, status) ~ trt + prior + karno:strata(t_group), data=toy1)

Now let's say I break apart the dataset at the same cut points (time=90 and 180).

g1 <- veteran %>%
filter(time <= 90)
g2 <- veteran %>%
filter(time > 90)  %>%
filter(time <= 180)
g3 <- veteran %>%
filter(time > 180)

Then I run the model again separately for each subset.

toy2 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g1)
toy3 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g2)
toy4 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g3)

Why are the coefficients I get different?

E.g. Here is the output for the subset:

Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno, data = g3)
n= 27, number of events= 25
       coef exp(coef)  se(coef)      z Pr(&gt;|z|)

trt   -0.586874  0.556063  0.514080 -1.142    0.254
prior -0.027458  0.972916  0.047311 -0.580    0.562
karno  0.003277  1.003282  0.017386  0.188    0.851
  exp(coef) exp(-coef) lower .95 upper .95

trt      0.5561     1.7984    0.2030     1.523
prior    0.9729     1.0278    0.8868     1.067
karno    1.0033     0.9967    0.9697     1.038
Concordance= 0.554  (se = 0.074 )
Likelihood ratio test= 2.07  on 3 df,   p=0.6
Wald test            = 2.04  on 3 df,   p=0.6
Score (logrank) test = 2.09  on 3 df,   p=0.6z

Vs. from the step function:

Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno:strata(t_group), 
    data = toy1)
n= 225, number of events= 128
                                coef exp(coef)  se(coef)      z Pr(&gt;|z|)

trt                            -0.011025  0.989035  0.189062 -0.058    0.953
prior                          -0.006107  0.993912  0.020355 -0.300    0.764
karno:strata(t_group)t_group=1 -0.048755  0.952414  0.006222 -7.836 4.64e-15
karno:strata(t_group)t_group=2  0.008050  1.008083  0.012823  0.628    0.530
karno:strata(t_group)t_group=3 -0.008349  0.991686  0.014620 -0.571    0.568
trt

prior

karno:strata(t_group)t_group=1 ***
karno:strata(t_group)t_group=2

karno:strata(t_group)t_group=3

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
                           exp(coef) exp(-coef) lower .95 upper .95

trt                               0.9890      1.011    0.6828    1.4327
prior                             0.9939      1.006    0.9550    1.0344
karno:strata(t_group)t_group=1    0.9524      1.050    0.9409    0.9641
karno:strata(t_group)t_group=2    1.0081      0.992    0.9831    1.0337
karno:strata(t_group)t_group=3    0.9917      1.008    0.9637    1.0205
Concordance= 0.725  (se = 0.024 )
Likelihood ratio test= 63.04  on 5 df,   p=3e-12
Wald test            = 63.7  on 5 df,   p=2e-12
Score (logrank) test = 71.33  on 5 df,   p=5e-14

In the model on hand-coded dataset I get a coefficient of 1.0033 for Karno; and in the step function model I get 0.9917 for the same variable. Why is this?

score 1 · Answer 1 · answered Feb 16 '22 at 19:18

The data processing via survSplit() is not the same as subsetting the data by the original time values for events or censoring. Examine the first 4 individuals in the veteran data set and how they are handled by the survSplit() function that provides your toy1 data set.

head(veteran,4)
  trt celltype time status karno diagtime age prior
1   1 squamous   72      1    60        7  69     0
2   1 squamous  411      1    70        5  64    10
3   1 squamous  228      1    60        3  38     0
4   1 squamous  126      1    60        9  63    10
head(toy1,9)
   trt celltype karno diagtime age prior id tstart time status t_group
1    1 squamous    60        7  69     0  1      0   72      1       1
2    1 squamous    70        5  64    10  2      0   90      0       1
3    1 squamous    70        5  64    10  2     90  180      0       2
4    1 squamous    70        5  64    10  2    180  411      1       3
5    1 squamous    60        3  38     0  3      0   90      0       1
6    1 squamous    60        3  38     0  3     90  180      0       2
7    1 squamous    60        3  38     0  3    180  228      1       3
8    1 squamous    60        9  63    10  4      0   90      0       1
9    1 squamous    60        9  63    10  4     90  126      1       2

Individuals with time > 180 are included in all 3 time strata in the toy1 data set used for your "step function" model. With your hand-coded subsets, only those having event or censoring times within each time epoch are included in the corresponding "subset" model. Thus the "step function" and "subset" models are working with different risk sets--the set of individuals at risk at each event time.

The solution to a Cox model involves evaluating, at each event time, the difference between the covariate values for the individual having the event and a risk-weighted mean of each covariate for all the individuals at risk at that time. See this answer for a brief description, or the more complete presentation in standard texts like Therneau and Grambsch.

In the "step function" model all 225 individuals contribute to risk sets until their event or censoring times. In the "subset" model you show, only 27 of them can do so, with no information provided by those with event or censoring times prior to time = 180. So you should expect to get different estimates for coefficients.

Thank you so much! I really appreciate this tutorial, @EdM! – neophyte1 Feb 16 '22 at 21:15 — neophyte1, Feb 16 '22 at 21:15

Step function in Cox Model

1 Answers1