0

I run an exhaustive model selection for Cox proportional hazard in R using "glmulti" package. I used the best model for creating multivariable Cox regression. In the multivariable Cox hazard, some variables had a p value > 0.05. Isn't this indicating a problem with model selection? by the way, I tried stepwise selection which gave the same model, and exhaustive model selection using another package which also gave the same best model! enter image description here

1 Answers1

4

There are lots of problems with "exhaustive model selection" and with stepwise. These have been covered here many times. I am not hugely familiar with glmulti (it appears to do some kind of model averaging, which might be a little better than stepwise). See this thread in particular. You can also see Frank Harrell's book Regression Modeling Strategies for details, proofs, examples, etc. Briefly, p values will be too low, standard errors too small, and parameter estimates biased away from 0.

But having variables in the final model that are not significant isn't a problem. It's a feature, not a bug. There are many reasons for including a variable that is not significant. E.g. it might be a good covariate that changes other parameter estimates. That one is discoverable through statistics alone. But it also might be important to find a small effect, in particular, if theory says the effect will be large. Or the variable might be part of one of your hypotheses. Or it might be involved in interactions.

Yet another problem with it is that it ignores all substantive knowledge.

Finally, if you are working for someone else, it tells that person to pay you less money because you are willing to just push buttons.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383