1

I have a stratified Cox model with a covariate which violates proportional hazard assumptions:

data(veteran)
toy <- coxph(Surv(time, status) ~ trt + prior + karno, data=veteran)

So I perform a step function:

toy1 <- survSplit(Surv(time, status) ~ ., data=veteran,cut=c(90, 180),episode="t_group", id='id')
toy <- coxph(Surv(time, status) ~ trt + prior + karno:strata(t_group), data=toy1)

Now let's say I break apart the dataset at the same cut points (time=90 and 180).

g1 <- veteran %>%
filter(time <= 90)
g2 <- veteran %>%
filter(time > 90)  %>%
filter(time <= 180)
g3 <- veteran %>%
filter(time > 180)

Then I run the model again separately for each subset.

toy2 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g1)
toy3 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g2)
toy4 <- coxph(Surv(time, status) ~ trt + prior + karno, data=g3)

Why are the coefficients I get different?

E.g. Here is the output for the subset:

Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno, data = g3)

n= 27, number of events= 25

       coef exp(coef)  se(coef)      z Pr(&gt;|z|)

trt -0.586874 0.556063 0.514080 -1.142 0.254 prior -0.027458 0.972916 0.047311 -0.580 0.562 karno 0.003277 1.003282 0.017386 0.188 0.851

  exp(coef) exp(-coef) lower .95 upper .95

trt 0.5561 1.7984 0.2030 1.523 prior 0.9729 1.0278 0.8868 1.067 karno 1.0033 0.9967 0.9697 1.038

Concordance= 0.554 (se = 0.074 ) Likelihood ratio test= 2.07 on 3 df, p=0.6 Wald test = 2.04 on 3 df, p=0.6 Score (logrank) test = 2.09 on 3 df, p=0.6z

Vs. from the step function:

Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno:strata(t_group), 
    data = toy1)

n= 225, number of events= 128

                                coef exp(coef)  se(coef)      z Pr(&gt;|z|)

trt -0.011025 0.989035 0.189062 -0.058 0.953 prior -0.006107 0.993912 0.020355 -0.300 0.764 karno:strata(t_group)t_group=1 -0.048755 0.952414 0.006222 -7.836 4.64e-15 karno:strata(t_group)t_group=2 0.008050 1.008083 0.012823 0.628 0.530 karno:strata(t_group)t_group=3 -0.008349 0.991686 0.014620 -0.571 0.568

trt
prior
karno:strata(t_group)t_group=1 *** karno:strata(t_group)t_group=2
karno:strata(t_group)t_group=3


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                           exp(coef) exp(-coef) lower .95 upper .95

trt 0.9890 1.011 0.6828 1.4327 prior 0.9939 1.006 0.9550 1.0344 karno:strata(t_group)t_group=1 0.9524 1.050 0.9409 0.9641 karno:strata(t_group)t_group=2 1.0081 0.992 0.9831 1.0337 karno:strata(t_group)t_group=3 0.9917 1.008 0.9637 1.0205

Concordance= 0.725 (se = 0.024 ) Likelihood ratio test= 63.04 on 5 df, p=3e-12 Wald test = 63.7 on 5 df, p=2e-12 Score (logrank) test = 71.33 on 5 df, p=5e-14

In the model on hand-coded dataset I get a coefficient of 1.0033 for Karno; and in the step function model I get 0.9917 for the same variable. Why is this?

1 Answers1

1

The data processing via survSplit() is not the same as subsetting the data by the original time values for events or censoring. Examine the first 4 individuals in the veteran data set and how they are handled by the survSplit() function that provides your toy1 data set.

head(veteran,4)
  trt celltype time status karno diagtime age prior
1   1 squamous   72      1    60        7  69     0
2   1 squamous  411      1    70        5  64    10
3   1 squamous  228      1    60        3  38     0
4   1 squamous  126      1    60        9  63    10

head(toy1,9) trt celltype karno diagtime age prior id tstart time status t_group 1 1 squamous 60 7 69 0 1 0 72 1 1 2 1 squamous 70 5 64 10 2 0 90 0 1 3 1 squamous 70 5 64 10 2 90 180 0 2 4 1 squamous 70 5 64 10 2 180 411 1 3 5 1 squamous 60 3 38 0 3 0 90 0 1 6 1 squamous 60 3 38 0 3 90 180 0 2 7 1 squamous 60 3 38 0 3 180 228 1 3 8 1 squamous 60 9 63 10 4 0 90 0 1 9 1 squamous 60 9 63 10 4 90 126 1 2

Individuals with time > 180 are included in all 3 time strata in the toy1 data set used for your "step function" model. With your hand-coded subsets, only those having event or censoring times within each time epoch are included in the corresponding "subset" model. Thus the "step function" and "subset" models are working with different risk sets--the set of individuals at risk at each event time.

The solution to a Cox model involves evaluating, at each event time, the difference between the covariate values for the individual having the event and a risk-weighted mean of each covariate for all the individuals at risk at that time. See this answer for a brief description, or the more complete presentation in standard texts like Therneau and Grambsch.

In the "step function" model all 225 individuals contribute to risk sets until their event or censoring times. In the "subset" model you show, only 27 of them can do so, with no information provided by those with event or censoring times prior to time = 180. So you should expect to get different estimates for coefficients.

EdM
  • 92,183
  • 10
  • 92
  • 267