4

One of the cohort studies I have carried out recently involved examining the association between post-diagnostic beta-blocker use and breast cancer outcomes in a cohort of breast cancer patients. Briefly, I followed patients up from their breast cancer diagnosis until death or end of 2017, and treated beta-blocker use as a time-varying covariate. That is, people were considered nonusers of beta blockers until their first post-diagnostic beta blocker dispensing, and once they were a user they were a user until the end of follow-up. For people who used beta blockers, they have two rows of data in my STATA dataset, with the rows split based on when they first used the beta blocker. An example of this data is shown below:

In this dataset, 'time5' represents each patient's entire follow-up time, and 'timetofbetablocker' is the time between breast cancer diagnosis and the patient's first beta blocker dispensing. As you can see, I've split the data for people who used BBs (patient 476 is the only one to have used a BB in this sample data), so in row 9 patient 476 is counted as using a BB between 8.85 years and 10.82 years. I've then created a variable (the second to last column in this sample data) to run a Cox model on, representing the time-varying nature of BB use (code if for STATA, sorry R users):

gen timetofbetablockerana=1 if _t0>=timetofbetablocker & timetofbetablocker!=.

replace timetofbetablockerana=0 if timetofbetablockerana==.

I can then run a simple (unadjusted) Cox model for BBs using stcox timetofbetablockerana

A question I have been grappling with for a while now is how I adequately test the PH assumption with this time-varying covariate. I don't think a simple estat phtest (testing the proportional-hazards assumption on the basis of Schoenfeld residuals) in STATA suffices for this, because the variable is time-varying. In slide 32 of this presentation (https://ms.uky.edu/~mai/sta635/Cox%20model.pdf), they suggest interacting the time-varying covariate with time itself and then testing the assumption on this variable. So, I was thinking of interacting 'timetofbetablockerana' with 'timetofbetablocker' to allow the value of the analysis variable to vary based on when people started using BBs, and then including this interaction term in the model. e.g, gen interaction= timetofbetablocker* timetofbetablockerana, replace interaction=0 if interaction==., and then stcox timetofbetablockerana interaction and estat phtest, detail to test the assumption on the interaction term.

Is anyone able to let me know if I'm on the right track here? If this isn't correct, does anyone have any other ideas/suggestions on how I could go about this? Further, even with this time-varying covariate in the model, if I were to test the PH assumption on time-independent/fixed covariates (which are additionally included in the fully adjusted model), would I simply go about it the usual way, even in the presence of time-varying covariates in the same model?

Oliver
  • 41
  • @EdM, I've seen you answer these types of questions before, so maybe you're able to help me? Thank you so much. – Oliver Nov 18 '23 at 18:35
  • The call-out with @ on this site only works for individuals who are already in the comment thread. I saw this question anyway, however. – EdM Nov 18 '23 at 19:25
  • I have edited the answer to include most of what was covered in comments, to make it easier for others to find. In this version of the answer, I have included responses to your most recent comments as well. This site frowns upon extended discussion in comments. – EdM Nov 21 '23 at 17:29
  • @edm thanks so much for your help! Basically everything I was unsure about has now been answered. I just have one final question. I know you said that categorical predictors (not the primary predictor) aren't of upmost importance, but also mentioned R having a feature that tests the PH assumption for these predictors. I have attached STATA output of one of my predictors here: https://imgur.com/KiKt2Mm. In this case, every other level is being compared to level 1, and level 5 violates the PH assumption. I'm guessing this isn't what you're referring to when you say that R can test the PH – Oliver Nov 22 '23 at 10:51
  • @edm assumption (as different levels are being compared to a base level in STATA rather than a global test), but do you have any idea how we can interpret this output? For example, is one 'level' violating the PH assumption enough to justify stratifying on this variable? Or would we need >1 level with a p-value <0.05 to justify taking this route? There are other categorical covariates in my model with a similar distribution of p-values, so this is also an issue for other variables as well. Given what you've said, I'm satisfied that my main predictor does satisfy the PH assumption though. – Oliver Nov 22 '23 at 10:57
  • Whether and how to stratify is a judgment call for which there is no definitive right/wrong answer. It's hard to evaluate p values without knowing how many cases are in each category and seeing the curves of residuals over time. The cox.zph() function in R has a terms argument that allows for a joint test over all levels of a multi-level categorical predictor; the smoothed plot of residuals then evaluates the entire linear predictor associated with all levels together. You might ask a STATA help group if that's available in that software. – EdM Nov 22 '23 at 14:48
  • @EdM Thanks again, you've been wonderful. I don't have any additional questions for the time being, so I just wanted to say that I really appreciate all your help in this thread! Very helpful and informative indeed. – Oliver Nov 22 '23 at 22:56

1 Answers1

5

I don't know the origin of the idea that Schoenfeld residuals can't be used for testing proportional hazards (PH) with time-varying covariate values. As Therneau and Grambsch say in Section 4.6:

Since the [Schoenfeld] residuals are defined at each unique death, their definition and computation is unchanged by a counting process formulation of the data set.

The "counting process formulation of the data set" is precisely what's used for time-varying covariates. The Cox model simply uses whatever covariate values are currently in place for all individuals at risk at each event time. Whether a value for some individual might be different from some prior value doesn't matter (provided that there's no more than 1 event possible per individual).

That fact, however, also points out the difficulty in formulating a useful time-varying covariate. Since only the current value of a covariate at an event time is used, the covariate value used for the model must incorporate whatever aspect of covariate history is associated with outcome at any particular time.

I don't know whether the STATA PH test based on Schoenfeld residuals might have some issues with time-varying covariates, but there's not a theoretical problem.

In response to/ incorporating comments

The time dependence vignette for the base R survival package deals (explicitly or implicitly) with most of the issues raised in comments. I think that all contained in this answer and my comments comes pretty directly from there or from the classic Therneau and Grambsch text linked from the first paragraph.

There is no problem with evaluating scaled Schoenfeld residuals for binary or categorical predictors, or for a mix of time-constant and time-varying covariates.

A single yes/no time-varying exposure variable as you use for the beta-blockers assumes that the association of outcome with drug exposure is independent of the duration of exposure: that the very first day on the drug provides the same effect as being on it for 10 years. A rolling-average estimate of drug exposure over a biologically reasonable time span might be better. In practice, a 0/1 no-drug/drug covariate might work well enough.

Constructing a useful time-varying covariate and testing it for PH are two different things. Whether you use a 0/1 drug exposure indicator or some type of running average, the PH test will determine whether the association of that marker with outcome is independent of time since diagnosis. PH might be violated, for example, if the drug made a lot of difference soon after diagnosis but had less association with outcome later on.

I don't know what's available in STATA for testing the PH assumption via scaled Schoenfeld residuals. The vignette linked above describes what's available in the basic R survival package. Numeric tests for PH depend on the choice of time scale, something that isn't always obvious at first.

There are several options for time-scale transformation available in R; transformation based on Kaplan-Meier estimates is a common choice. What's shown in your graph suggests that STATA is using a logarithmic transformation of time instead. In that time scale, at least, it looks like PH is a reasonable assumption, given the p-value that you report and the corresponding flat estimate of what I take to be a smoothed representation of the scaled residuals (see the "bandwidth" notation near the bottom of the graph).

The R survival package does have an overall display and test of the PH assumption for multi-level categorical predictors. I don't know whether that's available in STATA. If a categorical predictor sufficiently violates PH and isn't of interest except as something for which you wish to control, stratification on that predictor can be a useful choice. That said, the PH assumption is less critical for such control predictors; see the discussion on this page.

For a predictor of primary interest that fails PH, the vignette notes that partitioning of survival time into a few blocks can sometimes allow for PH within each time block, with a separate HR for each time block. Like any categorization of a continuous variable, however, that is somewhat arbitrary at best.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • thanks so much for your response, much appreciated. When you say "the covariate value used for the model must incorporate whatever aspect of covariate history is associated with outcome at any particular time", have I adequately dealt with this by splitting the covariate when a patient uses the medication for the first time? By doing so, I would think I would be "incorporating the covariate history", because people are only counted as users between the time they were dispensed the medication and the end of follow up. However, I would like some clarification. – Oliver Nov 20 '23 at 00:04
  • Further, it seems like you're suggesting that there's nothing theoretically wrong with running a PH test on Schoenfeld residuals on the 'timetofbetablockerana' variable (the variable that I use in my Cox model). The PH test is providing seemingly 'normal' p-values for this variable, but I was a bit worried that the test wouldn't hold up theoretically given that this variable can ONLY take on values of 0 and 1. So what you're saying is that I can run the PH test on this binary variable without needing to worry about interacting it with time? – Oliver Nov 20 '23 at 00:08
  • Sorry for the third comment. Further, there is no problem running the PH test on Schoenfeld residuals on time-independent variables, even in the presence of time-varying covariates in the same model, right? Thanks again for all your help, much appreciated. – Oliver Nov 20 '23 at 00:10
  • @Oliver there's no problem with Schoenfeld residuals for binary predictors or combinations of time-constant and time-varying predictors. A potential problem with your drug-exposure covariate is this: the single yes/no exposure variable assumes that the association of outcome with drug exposure is independent of the duration of exposure: that the very first day on the drug provides the same effect as being on it for 10 years. Whether that's reasonable depends on your understanding of the subject matter. See this vignette. – EdM Nov 20 '23 at 06:18
  • Thank you, I understand. I'm pretty sure the assumption about the effect of the exposure being independent of time doesn't hold up for my work, as there is probably some sort of dose-response effect whereby the effect of the drug accumulates over time. Is there any sort of workaround to this problem when testing for the PH assumption, or am I simply constrained by the fact that a dose-response relationship likely exists? To what extent is the test compromised because of this feature of the drug? Would you recommend doing away with the test entirely, or is there still some value in it? – Oliver Nov 20 '23 at 11:05
  • @Oliver constructing a useful time-varying covariate and testing it for PH are two different things. For the first, you might construct something like a rolling-average estimate of drug exposure over a time-course that makes sense. The 0/1 no-drug/drug covariate might nevertheless work well enough in practice. Whatever measure you construct, the PH test determines whether its association with outcome is independent of time since diagnosis. For example, PH might be violated if the drug made a lot of difference soon after diagnosis but had less association with outcome later on. – EdM Nov 20 '23 at 13:44
  • I understand. In the case of my covariate, I'm pretty sure that the effect gets stronger (more protective) as time accumulates. When examining log-log plots of timetofbetablockerana=1 vs timetofbetablockerana=0, there does seem to be a violation of the curves being parallel over the entire course of follow up. In this case, would it make sense to split follow-up time into different periods, and then to assess the assumption through Schoenfeld residuals and log log plots over different periods of follow-up(i.e. when the effects of the drug are different)? Would that hold up theoretically? – Oliver Nov 20 '23 at 15:39
  • @Oliver that's one of the ways recommended in the time-dependence vignette, to which I linked previously, to deal with that type of PH violation. It's somewhat arbitrary, as with any dichotomization, but it can sometimes be good enough at least to describe the results. I'd recommend looking at a plot of smoothed, scaled Schoenfeld residuals over time, to get a better sense of the magnitude and time course of the PH violation; those are also illustrated in the vignette. I don't find the log-log plots to be so helpful. – EdM Nov 20 '23 at 18:09
  • thanks again, you've been wonderful. I'm no expert on interpreting plots of Schoenfeld residuals, so could you please help me with this? Here is a plot of the residuals for 'timetofbetablockerana' (my medication variable) over time (although I'm not sure the residuals are 'smoothed', I don't know how to do this), and it appears as though the slope is roughly equal to 0: https://imgur.com/a/P6js8aI. The p-value testing for a deviation of the slope=0 is 0.37, suggesting that the assumption holds. Is this the correct interpretation? Is there anything else to worry about for this covariate? – Oliver Nov 20 '23 at 23:15
  • sorry for another question, but you're an absolute encyclopedia of knowledge. When running the test for slope=0 for individual covariates in the model, some 'levels' of categorical variables appear to violate the PH assumption (p<0.05) when comparing with a base 'level' for the categorical variable. How do you suggest handling these violations? Should I be stratifying the Cox model by these variables? I'm assuming there is no good way to graphically visualise violations of the PH assumption for categorical variables with >2 levels. – Oliver Nov 20 '23 at 23:20