1

I have a question with regard to interaction term in Cox PH model.

I'd like to analyze the impact of variable A on cardiovascular (CV) event. Variable A levels are different according to sex, although the sex-specific cut-offs are not available (If cut-offs are available, I would have been able to categorize variable A according to sex-specific cut-off but it was not possible); therefore, I used interaction term 'variable A * sex'. And 'sex' is another well known predictor for CV event.

'Variable A * sex' and 'sex' were both significant in the univariate analysis. So I put these two variables, 'variable A * sex' and 'sex' in the multivariate model at the same time. And the results showed HR and 95% CI for 'sex' was very high and wide.

I assume this is because I put 'sex' twice in the multivariate model, one is in its original form and one as an interaction term.

So in this case, would it be proper to exclude 'sex' in the multivariate model, although it is a well known risk factor for outcome?

Whether I exclude 'sex' or not, the results are same. Sex is not significant in terms of p-value in the multivariate model.

Thanks.

doyle
  • 11

1 Answers1

1

... the results showed HR and 95% CI for 'sex' was very high and wide. I assume this is because I put 'sex' twice in the [multivariable] model, one is in its original form and one as an interaction term.

You don't specify which software you used, but R would know not to double-count a predictor in the way that you fear. What's more likely is that your variable A is more closely related to outcome than is sex per se. With variable A highly associated with sex, once you've taken variable A into account in your multivariable model it's hard for the model to pin down how much is left for sex to explain.

It's not good practice to remove "insignificant" predictors from models, for several reasons. First, your audience will presumably want to see that you accounted for sex if it is "known" to be associated with outcome. Second, removing any outcome-associated predictor in a Cox model (even if it's not "significant" itself) can bias the coefficients for the included predictors. Third, removing it will make predictions on future data less reliable.

Be glad that you didn't "categorize variable A"; that's not usually a good idea. If you had categorized it, there's a good chance that you might have missed how much of the association of sex with outcome might be due to the association of variable A with both sex and outcome. Your model might further benefit from flexible modeling of variable A, for example with restricted cubic splines, instead of what seems to be a simple linear association with log-hazard in your model.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you very much. Doing subgroup analysis stratified by sex, as well as drawing restricted cubic spline curves, are actually my next step. I use both R and SPSS and in my question I used SPSS because it's more convenient for me. In case of SPSS, is it needed to remove 'sex' because I put it in the model as an interaction term? I'll try the same analysis with R but wanted to ask. Thanks again! – doyle Jul 31 '22 at 01:23
  • @doyle subgroup analysis is less powerful than a combined model that can draw simultaneously from all the data. If the combined model meets the proportional hazards assumption adequately, that really won't help. Cubic splines are a good idea. I don't use SPSS so can't tell you how it deals with multiply specified predictors. If the models are identical in both cases then it knew not to double-count the predictor. – EdM Jul 31 '22 at 01:50
  • Thanks very much! – doyle Jul 31 '22 at 12:07