Interaction effects in Cox model (continuous, dichotomous, and categorical variables)

Question

I get a couple of puzzling results in my (repeated event) cox model when I introduce interaction effects. I will here pose several questions about interaction effects (in survival analysis context) in order to – hopefully– once for all to get the answers to these questions. I've checked similar posts, related to this matter (1, 2, 3, 4, 5, 6, 7, 8 ), and some of them are unanswered, while the others are answered ambiguously. Some of them are helpful. In general, I belive there is a need (and interest in) for some clarification about interaction effects – a quite complicated area for all quantitative methods–focused student/professionals.

Ultimately, my questions relate to the logic behind interactions and their subsequent interpretation in the analysis. Below I present 5 different scenarios/models derived from my data analysis – but I extend them a bit to also include other examples that might be of help for me and (hopefully) for other people on this website.

For every scenario, I provide my own interpretations (in order to capture the essence and logic, they're not comprehensive interpretations) – so those of you who are able to answer, please reject or support them. If possible, provide a correct answer and elaborate why something was incorrect.

Scenario 1

Suppose that I have a model with 2 covariates where one of the covariates is my main explanatory variable (note that it makes sense to have this variable without an interaction term as well). Guided by my theoretical considerations, I (also) introduce an interaction term between them.

My main explanatory variable (X) is on the scale 0 to 10 (think of number of appearances) and the other covariate (D) is also a continuous variable (ranging from 0 to 10). The model with interaction term:

model.1<–coxph(start, stop, event)~X+D+X:D+cluster(ID)+strata(enum), data=mydata)  

                    exp(coef) exp(-coef) lower .95 upper .95
X                    1.069     0.9356    0.9798     1.166
D                    1.046***  0.9561    1.0213     1.071
X*D                  1.000     0.9999    0.9876     1.013

Suppose now that in model with only X+D (with no interaction term), my main variable X was significant. It is not significant in the interaction model (see above result).

My interpretation 1) I simply state that there were no interaction effects between X and D. However, while the D variable is significant (with increasing hazard rate) the X is not. Thus, my main explanatory variable is not sufficient to explain this. Alternatively, 2) I state that there were no interaction effects, and the coef. of X in the interaction model does not make any sense or is hard to interpret. I don't even show this results, but put it on a note.

Question: how should I interpret interaction effects between two continuous variables in this model?

Scenario 2

In this scenario the X variable is still a continuous variable 0-10, but the D-variable is now dichotomous.

                    exp(coef) exp(-coef) lower .95 upper .95
X                   1.0677.    0.9366    0.9933     1.148
D                   1.3628***  0.7338    1.1351     1.636
X*D                 0.9994     1.0006    0.9150     1.092

My interpretation: "X:D" is decreasing, i.e. when D=0 and X increasing, the hazard for experiencing the event is decreasing(weak), but the effect is not significant. When "D" is = 1, the hazard is increasing.

Scenario 3

"X" is till continuous, but the "D" is now categorical (0 = no appearances, 1 = one appearance, 2 = two appearances, 3 = three appearances).

                     exp(coef) exp(-coef) lower .95 upper .95
X                     1.0491***  0.9532    1.0226     1.076
factor(D)1            1.2237     0.8172    0.8350     1.793
factor(D)2            1.7871.    0.5596    0.9910     3.223
factor(D)3            1.0578     0.9453    0.4625     2.420
X*factor(D)1          0.9849     1.0153    0.9336     1.039
X*factor(D)2          0.9859     1.0143    0.9021     1.077
X*factor(D)3          1.0390     0.9625    0.9230     1.170

Question: How should I interpret the interaction term here?

Scenario 4

Now the "X" becomes dichotomous (1/0) and the "D" remains categorical as in Scenario 3.

                   exp(coef) exp(-coef) lower .95 upper .95
X                    1.386**   0.7214    1.1315     1.698
factor(D)1           1.195     0.8370    0.8435     1.692
factor(D)2           1.659.    0.6029    0.9635     2.855
factor(D)3           1.061     0.9425    0.4820     2.336
X*factor(D)1         0.900     1.1111    0.5848     1.385
X*factor(D)2         0.986     1.0142    0.4979     1.952
X*factor(D)3         1.352     0.7394    0.5097     3.589

My interpretation: The interaction term is not significant, as in all Scenarios. But the interpretation would be that when X is = 1, the D = 1 and D = 2 are decreasing (compared to D=0) but when X=1 and D=3, the hazard is increasing.

Scenario 5

Suppose now that the "X" and the "D" variables are exactly the same as in the previous scenario. However, this time, variable "X" violates the PH assumption. So I am introducing an interaction term between X and stop/start time (years). I know that some would argue that one needs to split the data before doing this, while others would not necessary recommend this. This is somehow a side-debate here. Interesting, but not really relevant here for our example. It's also been discussed elsewhere here. Nevertheless, here is the model:

            exp(coef) exp(-coef) lower .95 upper .95
X             1.5848*    0.6310    1.0795    2.3268
factor(D)1    1.1301     0.8849    0.9192    1.3893
factor(D)2    1.6507**   0.6058    1.1655    2.3378
factor(D)3    1.2698     0.7875    0.7991    2.0179
X*stop        0.9488*    1.0540    0.9026    0.9973

My interpretation: The interaction with time does correct for the violation of the assumption: X is decreasing with years. However, X alone is increasing. What is going on here? It doesn't make any sense to me. Unless, the X = 0 (alone), and X = 1 with * stop in the model. If so, the interpretation is then that X = 1 * stop is decreasing over time, while when X = 0, the hazard rate increases with 1.58.

EDIT (additional information):

The variables "X" and "D" are actually discrete (1, 2, 3, 4,..10) but they are treated as continuous.

I use conditional model ( or "PWP"-model), and the time scale is "time since entry".

Both X and D are time-dependent (or time-varying) variables.

Are both X and D (in 1) continuous or they are in fact discrete and you treat them as continuous? I.e. do they take values outside 1,2,3 etc?
What kind of model are you actually fitting? It seems to me that it is a conditional (or "PWP") model where the first event, second event, etc, all have different intensity (hazard) functions. What is the time scale? (e.g. gap time or time since entry)

Are X and D time dependent? — Theodor, Oct 29 '15 at 16:09
Thank you for asking @Theodor They are discrete, I treat them as continuous. Your are right, I use conditional model. The time scale is in years ( time since entry, until the the event or censored). Repeated events. Both X and D are time dependent variables. — FKG, Oct 29 '15 at 19:21
sorry for the late reply. I will try to give a longer answer later today or tomorrow, although I am not very familiar with the PWP model. — Theodor, Nov 01 '15 at 13:29
About the time scale: If a subject has events at t1, t2, t3 in calendar time, do you take the pairs (tstart, tstop, enum) as (0, t1, 1), (t1, t2, 2), (t2, t3, 3) etc? About X & D: can you provide more detail on how they are determined? (especially if they have some connection with enum). Thanks — Theodor, Nov 01 '15 at 14:21
Thanks @Theodor. I am not sure if I understand the first question, but my interpretation is yes, each event is assigned to a separate stratum. However, the way I coded the "enum", it says that all of the observations in (for example) strata 1 have the same underlying hazard of events, and all those in strata 2 have the same underlying hazard - different than strata 1.
The X variable is the number of appearances for each program, the total number of programs being 12. So the enum is coded in a way that says that each programs have different underlying hazard. — FKG, Nov 01 '15 at 19:30
The D variable is just a score all subjects receives, not sensitive to programs though (accounts for external factors). So in other words, the main variable here is the X. I assume that the impact of X may vary depending on the program.
But if this adds more complexity, please assume something different, a more simple PWP-model. — FKG, Nov 01 '15 at 19:43

score 1 · Accepted Answer · answered Nov 02 '15 at 10:57

I'll give a try to answer this, but keep in mind that I do not have a real experience with the PWP model and if anybody has a better input that would be welcome.

General observations

I have a problem with treating discrete covariates as continuous, in general. In my opinion, there is no sensible interpretation when this is done.
You should not use unstandardized continuous covariates. For the combination X=10 and D=10 for example, the estimated coefficient will most likely be very small (because in the expressions of the Cox model they appear in the exponential, yielding very high values).
When using strata for every event, you should have a lot of data for all combinations of groups for all strata, in order to have power to reject hypothesis concerning covariate effects.
Of more interest here would be regression coefficients rather than hazard ratios. Assume the final model estimates $\beta_X$, $\beta_D$ and $\beta_{XD}$ as regression coefficients. In this case the overall effect of $X$ given $D=d$ can be calculated as $(\beta_X + \beta_{XD} d)$ (of course this should be put more formal if you have several groups, with dummy variables, etc).
I would not use "increasing" hazard rate, when you refer to an "increased" hazard rate due to a covariate. The former appeals more to the shape of the hazard function, which is not of a concern in this case.
The interpretation of covariate effects in survival analysis with proportional hazards is not that different from regular regression analysis, just that the covariate effects $\beta$ have the interpretation of log-hazard-ratio. In the PWP model, this is also conditional on being at risk for the $k$-th event.

Other general things Another thing would be that in the model selection you should also factor in what would make sense and what would be a useful model for your research question. I think it's generally bad practice to fit a lot of models without knowing beforehand what question that model answers.

A puzzling quote is

In general, I belive there is a need (and interest in) for some clarification about interaction effects – a quite complicated area for all quantitative methods–focused student/professionals.

This is why statistics textbooks exist. Any decent book on regression models should explain interaction effects. For example, I used the Fox book (but I assume there are plenty out there).

As a final recommendation, it would be instructive to write down the hazards expressions and their estimates for all the groups and the combination of groups, with pen and paper. This I think would clear up many of the confusions that you encounter in interpreting these effects.

Keeping all these in mind, I'll give some comments on the scenarios that you mentioned.

Scenario 1

Suppose now that in model with only X+D (with no interaction term), my main variable X was significant. It is not significant in the interaction model (see above result).

My intuition tells me that this should not happen too often, i.e. removing something not significant should not alter the other estimates a lot. This might happen though because you lose power when adding the interaction effect (you lose "degrees of freedom").

My interpretation 1) I simply state that there were no interaction effects between X and D. However, while the D variable is significant (with increasing hazard rate) the X is not. Thus, my main explanatory variable is not sufficient to explain this. Alternatively, 2) I state that there were no interaction effects, and the coef. of X in the interaction model does not make any sense or is hard to interpret. I don't even show this results, but put it on a note.

Keep in mind observations 1, 2 and 5. There might be an interaction effect, but you just don't have enough power to detect it. The coefficient of the main effect of $X$ does make (some) sense: it is the log-hazard ratio for a subject with $D=0$.

Scenario 2

Again keep in mind observations 1 and 2.

My interpretation: "X:D" is decreasing, i.e. when D=0 and X increasing, the hazard for experiencing the event is decreasing(weak), but the effect is not significant. When "D" is = 1, the hazard is increasing.

The total effect of $X$ is $\log(1.0677) + d \times \log(0.9994) = 0.0655 - d \times 0.0006$. So a larger value of $X$ leads to an increased hazard (ratio), regardless of $D$. The total effect of $X$ is slightly smaller when $D=1$.

Scenario 3 Here there are 3 interaction terms. It is instructive to compute again the total effect of $X$, conditional on the values of $D$. It looks like for $D\in\left\{1,2\right\}$ the effect of $X$ is attenuated as compared to when $D=0$, and for $D=3$ the effect is amplified, as compared to when $D=0$. The interactions are not significant, which means that you do not have enough power to reject the hypothesis of interaction in this data set.

Scenario 4

My interpretation: The interaction term is not significant, as in all Scenarios. But the interpretation would be that when X is = 1, the D = 1 and D = 2 are decreasing (compared to D=0) but when X=1 and D=3, the hazard is increasing.

If I read this in a paper I would be hopelessly confused. What is decreasing? The $D$? (I have a feeling I know what you refer to, but you should try to express things less informal).

Scenario 5

My interpretation: The interaction with time does correct for the violation of the assumption: X is decreasing with years. However, X alone is increasing. What is going on here? It doesn't make any sense to me. Unless, the X = 0 (alone), and X = 1 with * stop in the model. If so, the interpretation is then that X = 1 * stop is decreasing over time, while when X = 0, the hazard rate increases with 1.58.

Again, I don't understand your interpretation. What does "$X$ is decreasing with years" mean? Is that the value of $X$? Is it the effect on the hazard ratio? At first glance, it seems to me that the $X=1$ group has a higher hazard rate than the $X=0$ group, at time $0$. As time goes by, this difference becomes smaller.

Thank you Theodor for this answer, and thanks for taking your time. Now I need to process all this input. And re-read some text books on survival. — FKG, Nov 04 '15 at 03:42

Interaction effects in Cox model (continuous, dichotomous, and categorical variables)

1 Answers1

Linked