Stratified-Extended Cox regression modeling to deal with survival data with time-varying covariates

Question

I'm working on Cox regression in my PhD research and I would like to know some references about applying the stratified-extended cox regression model on a real life data.

I'm interested about combining the two approach: stratification and the extended cox PH in a single model and not separately.

The R time-dependence vignette shows how to handle time-varying covariates, and I don't know of a reason why you can't also include a strata() term in such a model Is there some more specific problem that you have in mind? If so, please edit the question to elaborate. — EdM, Apr 14 '23 at 09:10
The problem that i have is my Data supposed to have a time-independent variable -e.g age- which doesn't meet the proportional hazards assumption, so we stratify the age into many strata, and at the same time the Data contains at least one time-dependent which doesn't meet the proportional hazards assumption, so we extend the Cox model by adding more time-varying covariates... — Youcef Bouzir, Apr 17 '23 at 04:53

EdM · Accepted Answer · 2023-05-06T22:03:06.090

You have two predictors that seem to fail the proportional hazards (PH) assumption, one of which is time-varying. I'll describe other ways that might handle the PH problems better, then end with some suggestions for the approach you describe if that's still necessary.

Much can be learned from the time-dependence vignette of the R survival package, and from Frank Harrell's Regression Modeling Strategies. It's also worth getting access to the classic text by Therneau and Grambsch, which goes into detail about many applications of Cox models. In particular, these references discuss different ways to handle violations of PH.

Model continuous predictors flexibly

It sounds like you tried a single linear term for age in your model, did a PH test, and found a PH violation. If age has a more complicated association than that with outcome, however, then the improper modeling of age can show up as an apparent violation of PH. Specifying the functional form of the association properly might fix the PH problem on its own. A regression spline for age is a good way to let the data tell you the appropriate functional form. I find the rcs() function in the R rms package to have more useful defaults than the ns() function in the splines package.

The same approach could fix the problems with your time-varying covariate, if it's a continuous variable. There is no fundamental difference between time-fixed and time-varying covariates in terms of how the regression coefficients are estimated in a Cox model, except that with a time-varying covariate the algorithm picks out, at each event time, the covariate values that happen to be in place for each at-risk individual at that specific event time. Fitting the proper functional form for the association with outcome thus might also fix the PH issue for your time-varying covariate.

Does lack of PH matter?

With a large data set it's quite possible to have a "statistically significant" violation of PH that doesn't matter in practice. That's a judgment call, to be made based on your understanding of the subject matter. Even if PH is violated, you end up with a type of event-averaged coefficient estimate that might be adequate for some purposes.

Handling time-varying covariates

As noted above, at each event time the algorithm looks only at the covariate values in place at that specific time for all individuals still at risk. There is no consideration of past covariate history, just the present values. That might not adequately describe the association of a covariate with outcome. For example, current blood glucose levels might not be associated so strongly with cardiovascular events as are hemoglobin A1C levels, which represent time-averaged blood glucose. Think very carefully about the biology underlying the time-varying covariate, to see if using only instantaneous values makes sense. In some circumstances you might want to model both the trajectory of the time-varying covariate and the time to event; see the survival task view for suggestions about joint modeling.

Your approach

There is no problem (in principle) building a Cox model with time-varying covariates and adjusting those covariates via a function of time to handle a violation of PH. Adjusting time-varying covariates as a function of time requires specifying separate time-adjusted covariate values for each individual at risk at each event time in the data set. The footnote to this answer mentions 2 ways to start to do that, although you might need to do some data manipulation yourself to ensure the correct format. You have to make sure that, at each event time, the algorithm can find the correct time-adjusted value of the time-varying covariate for each individual still at risk. The potential problem in practice is that you can end up with extremely large data sets, as a single individual will have one row for each event time in the entire data set during which the individual is at risk.

Stratification by another variable (ageGroup here) adds no additional problem; you simply set ageGroup as a multi-level categorical predictor and specify a term strata(ageGroup) in the coxph() function of the survival package (or strat(ageGroup) if you use the cph() function of the rms package). At each event time, the comparisons among covariate values are restricted to individuals within the same stratum as the individual having the event. Sometimes having a large number of strata can lead to practical problems arising from small numbers of individuals within a stratum. Thus, if a spline doesn't fix the PH problem for age, I'd recommend modeling a time-varying coefficient for age instead as described in the time-dependence vignette rather than breaking age up into multiple strata.

In response to comments

The hazard in a Cox model for an individual $i$ with time-varying covariate values $X_i(t)$ can be written:

$$h_i(t)= h_0(t) \exp(X_i(t) \beta) ,$$

where $h_0(t)$ is the baseline hazard and $\beta$ is the vector of regression coefficients (coefficients assumed for now to be constant in time). That is the form handled directly by the coxph() function in the R survival package via the counting-process format, with outcomes coded as Surv(startTime, stopTime, status). That form allows for time-fixed covariates too; you just code the same value of a time-fixed covariate into each data row for an individual.

For stratification of such a model, you have two choices. You could assume that only the baseline hazards are different among strata, but the $\beta$ coefficients are shared among strata. Then the above equation, for individual $i$ in stratum $s$, becomes:

$$h_i(t | s)= h_{0,s}(t) \exp(X_i(t) \beta) ,$$

where $h_{0,s}(t)$ is the baseline hazard for stratum $s$. In the coxph() function you specify such a fit for strata defined by ageGroup by adding a term +strata(ageGroup) to the predictors. That's how stratification is usually handled. Again, there is no problem with incorporating time-varying covariates via the counting-process data format, or specifying a time-constant covariate by simply repeating the same value for each data row corresponding to an individual.

It's possible also to allow one or more $\beta$ coefficients to differ among strata. For that, you add an interaction term between the predictor of interest and the strata. For example, if you want the coefficient for cov1 to vary among age-group strata, include a predictor term +cov1*strata(ageGroup). The statements above for incorporating both time-fixed and time-varying covariates into the model still hold.

thanks for the good explanation, but I need how can we formulate the two approaches in a single model and how can we estimate the parameters and how can we evaluate the performance model? — Youcef Bouzir, May 03 '23 at 04:50
@YoucefBouzir in R the standard coxph() function and the cph() function of the rms package linked in the question can combine stratification with time-varying covariates. The rms package provides a validate() function that can evaluate the quality of models produced by that package. — EdM, May 03 '23 at 14:01
can we write the formula of the model that can stratify and extend the cox regression model as following: \begin{equation} \textbf{$h_s(t,X) = h_0_s(t)\exp\left(\sum_{a=1}^{p_1} \beta_a_i X_a_i + \sum_{b=1}^{p_2} \gamma_b_i X_b_i (t_j)\right)$} \end{equation} — Youcef Bouzir, May 05 '23 at 00:27
@YoucefBouzir the equation in your comment doesn't show up correctly with my browser. You might edit the question itself to get clarification. — EdM, May 05 '23 at 12:39
hs(t∣X(t))=h0s(t)exp{(β1X1s+...+βpXps)+(γ1X1s(t)+...+γqXqs(t))} this model accommodate both time-varying -time dependent- in the second part of the exponential and time-fixed -time independent- in the first part, in each stratum s, that’s mean this model need a data which has at least one of time independent variable (like the AGE, which allow us for stratification) and at least one dependent variable (which allow us for extension of cox model) violate the PH assumption, is that true! If so, how can we proceed with a developing and estimate the parameters. — Youcef Bouzir, May 06 '23 at 20:57

Stratified-Extended Cox regression modeling to deal with survival data with time-varying covariates

1 Answers1