1

I would like to know if in multivariable Cox regression analysis there is a way to yield only models that include a variable of interest (and if no model is statistically significant to just answer that).

I am studying a cell phenotype. This phenotype "high" is associated with graft loss by the log rank test and in an univariate Cox analysis (versus phenotype "low"). I have 14 demographic variables also associated with graft loss as previously published. Now I would like to explore multivariate models. But I am interested only on models that would include my phenotype variable. If no such model is statistically significant it's fine but I want to know. I have 15 variables including my variable phenotype so I would like to be able to test all models with 2 to 15 variables that include the variable phenotype.

Thank you for your help!

Elodie
  • 11
  • How many events (presumably, graft losses) are in your data set? Do you have complete data on all 15 potential predictor variables? – EdM Aug 20 '22 at 01:57

1 Answers1

0

This strikes me as p-hacking, but I'm a stats novice so I could be wrong.

If you decide to move forward with this strategy, I would use the My.stepwise.coxph function in R where you can specify that you want a variable to be included in each regression using the argument in.variable.

Here's the link to the documentation:https://www.rdocumentation.org/packages/My.stepwise/versions/0.1.0/topics/My.stepwise.coxph

  • 1
    Your sense that this is close to p-hacking is correct. Automated model selection like stepwise is not a good idea in general. It's a particular problem with Cox models, that have omitted-variable bias if any outcome-associated predictor is left out of the model. – EdM Aug 20 '22 at 02:03
  • Hi. OK, thanks for your answers and warnings! – Elodie Aug 22 '22 at 13:03
  • @Elodie another approach to your problem, if you don't have many cases, is to include your phenotype as a predictor and do a ridge-regression penalization on the clinical covariates. That way you don't have to do variable selection (with its p-hacking) and you still account for all your outcome-associated variables. Tihs paper is an example. Or see Frank Harrell's ideas on data reduction in Section 4.7 of his course notes for how to reduce multiple clinical variables into a smaller number of predictors. – EdM Aug 26 '22 at 18:29