0

I am trying to look at how two different factors A (levels A1, A2, A3, A4) and B (levels B1 and B2), as well as the interaction between the two, influence the time to an event X. As a result I am trying to use a cox proportional hazards model as my data contains censored data (individuals for which event X did not occur in the time of the study period: 1 = Event occured, 0 = Event did not occur). My model is thus as follows:

model.all<-coxph(Surv(X, Censor.Y.or.N) ~ A*B, data = data)
summary(model.all)
Call:
coxph(formula = Surv(X, Censor.Y.or.N) ~ A * 
    B, data = data)

n= 199, number of events= 119 (2 observations deleted due to missingness)

                       coef exp(coef) se(coef)      z Pr(&gt;|z|)   

AA2 -0.5319 0.5875 0.3929 -1.354 0.17582
AA3 -0.5779 0.5611 0.4104 -1.408 0.15909
AA4 -0.8626 0.4220 0.3701 -2.331 0.01977 * BB2 -1.4654 0.2310 0.4706 -3.114 0.00185 ** AA2:BB2 1.5935 4.9208 0.6509 2.448 0.01436 * AA3:BB2 1.7029 5.4896 0.5898 2.887 0.00389 ** AA4:BB2 1.7132 5.5468 0.5715 2.998 0.00272 **


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                    exp(coef) exp(-coef) lower .95 upper .95

AA2 0.5875 1.7022 0.27197 1.2690 AA3 0.5611 1.7823 0.25100 1.2542 AA4 0.4220 2.3694 0.20432 0.8718 BB2 0.2310 4.3292 0.09184 0.5810 AA2:BB2 4.9208 0.2032 1.37410 17.6218 AA3:BB2 5.4896 0.1822 1.72773 17.4425 AA4:BB2 5.5468 0.1803 1.80962 17.0021

Concordance= 0.609 (se = 0.027 ) Likelihood ratio test= 14.27 on 7 df, p=0.05 Wald test = 12.85 on 7 df, p=0.08 Score (logrank) test = 13.75 on 7 df, p=0.06

So far my interpretations of the main effects are as follows:

A (adjusted for B):

  1. At a given instance in time, event X is 0.59 times as likely (41% less likely) in A2 individuals compared to A1 individuals. Time to event X is not significantly longer for A2 compared to A1.
  2. At a given instance in time, event X is 0.56 times as likely (44% less likely) in A3 individuals compared to A1 individuals. Time to event X is not significantly longer for A3 compared to A1.
  3. At a given instance in time, event X is 0.42 times as likely (58% less likely) in A4 individuals compared to A1 individuals. Time to event X is significantly longer for A4 compared to A1.

B (adjusted for A):

  1. At a given instance in time, event X is 0.23 times as likely (77% less likely) in B2 individuals compared to B1 individuals. Time to event X is significantly longer for B2 compared to B1.

A likelihood ratio test revealed the interaction between A and B to be significant. The problem I am now having is that I am not sure how to interpret the interaction terms in the summary output. Is the following interpretation for, A2:B2 for example, correct?

  1. At a given instance in time, event X is 4.92 times as likely (392% more likely) in individuals that are both A2 and B2 compared to individuals that are A1 and B1. Time to event X is significantly shorter for A2:B2 individuals compared to A1:B1 individuals.

Furthermore, if this is the case, is there any way to gain hazard ratios (and their significance levels) for within and between factor comparisons for example: comparing B1 and B2 within each level of factor A; or comparing A1-A2, A1-A3, A2-A3 within each level of factor B.

Any help anyone can provide with my query would be greatly appreciated.

1 Answers1

0

An initial problem is that your overall model has questionable "statistical significance." Even the generally most reliable overall test, the likelihood-ratio test, only has p = 0.05. We'll put that aside for now, however, to address some other parts of your interpretation that hold for Cox models with interaction terms.

First, when you have an interaction in any regression model, the "main effect" coefficients are only for a highly restricted situation. What you have for the "main effects" of the various levels of the categorical predictors are their associations with outcome (relative to their own reference level) when the interacting predictor is also at its reference level.

Second, the interaction coefficients in any regression model are for additional associations with outcome beyond what you would predict based on the "main effect" coefficients. That seems to be the way that you are interpreting them. I'm a bit worried about the very high values of those coefficients and the extremely wide confidence intervals, however.

Third, it's risky to use terminology like "0.59 times as likely" when dealing with hazard ratios. The hazard at a specific time is the instantaneous risk of an event given that you have already survived that long. That's not the same as the overall "likeliness" of an event; in a standard survival model all individuals are equally likely (and assumed certain) to have an event eventually. The word "hazard" has a well-defined meaning in survival analysis. Stick to it lest you make statements that might not be true.

Fourth, it's not clear where you are getting information on time-to-event from this summary. Under the proportional-hazards assumption, someone with a higher hazard than another would be expected to have a shorter predicted time-to-event, although a test on mean or median survival times might not find that difference to be "statistically significant."

Fifth, this type of summary of any regression model isn't usually adequate to evaluate what's going on with a model containing interaction terms. The coefficient estimate(s) and the apparent "significance" of the "main effect" term for a predictor depend on how that predictor and its interacting predictors are coded.

You need an overall estimate of the significance of each categorical predictor that includes all its levels and its interactions. That can be provided by the Anova() function in the R car package or the anova() function in the rms package when applied to objects generated by rms (as with its cph() function for Cox models).*

The rms package allows for predictions like those you ask for at the end of your question if you use its cph() function for a Cox model. The emmeans package can provide predictions based on a wide variety of models generated by R packages.


*Be very careful in your choice of anova() function, as the default in the R stats package can produce misleading results with unbalanced data. Make sure that you know which anova() or Anova() function you are invoking on your model result.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • In regards to your third and fourth points could we then say, for example in relation to the main effects of variable A, "at a given instance in time the instantaneous risk of event X is 0.59 times as high (41% lower) in A2 individuals compared to A1 individuals given that they have survived that long and that they are also B1 individuals". As a result they would be expected to have a longer time to X. – Insect_biologist May 17 '23 at 21:39
  • Similarly, if we were to look at the A2:B2 individuals we can say that A2:B2 individuals, compared to A1:B1 individuals, have a 0.59+0.23+4.92 = 5.72 times higher instantaneous risk of event X at a given time given that they have already survived this long. As a result they would be expected to have a shorter time to X. – Insect_biologist May 17 '23 at 21:48
  • In regards to point five could explain a little bit more about what you mean by this. How would how the main effects and interactions are coded affect the coefficient estimates? Also, when using the cph() function from the package rms to run the proportional hazards model does this treat the data differently compared to the coxph() function in the survival package? – Insect_biologist May 17 '23 at 21:50
  • @Insect_biologist for the last point, see this page for example. A predictor's "main" coefficient changes if you re-center a predictor interacting with it. Your A2:B2 analysis is incorrect. In the hazard-ratio scale that you use (the exp(coef) scale), effects multiply. You could add the coef values and then exponentiate. Your first comment might be technically correct, but I find that adding statements like "41% lower" often leads to confusion. The hazard ratio itself (0.59 in that case) has less risk of ambiguous interpretation. – EdM May 17 '23 at 22:04