I am using a cox proportional hazards model to run a survival analysis in r on a number of non-nested, distinct covariates such as Age, Blood Type, Cancer, etc:
A, B, C, D, E
When I run the model on the omnibus null hypothesis:
surv ~ A + B + C + D
The effects of all of the covariates are insignificant because the number of subjects that have measurements for every covariate is relatively small. However, when I isolate single or other combinations of covariates in different cox models:
surv ~ A
surv ~ A + C
surv ~ B + D
I'm showing significant effects because the sample set is larger (i.e. the number of observations discarded by the model shrinks).
What I'm having difficulty understanding is how to do the following:
- Comparing the different cox models for the best fit, i.e. is
surv ~ A + B + Da better model thansurv ~ A + C? Should I be comparing the likelihood, wald or logrank scores? - Is it possible to run every possible combination of covariates to determine the best model? I have about 15 covariates.
- More broadly, is this tactic the best approach to optimizing for both significant covariates and overall model "cost"? I will be attaching a cost to each distinct cox model i.e. using covariates
A + B + Cin the model costs \$100 while using covariatesA + Bcosts \$75 and using only covariateAcosts \$10. I'd like to look at the cost for each combination of covariates vs. the accuracy for each cox model.
Thanks very much for your help!
Likelihood ratio test = 10.47 on 2 df, p=0.005316. Wald test = 10.02 on 2 df, p=0.006686. Score (logrank) test = 10.24 on 2 df, p = 0.005966vs. (for a 13 covariate example):Likelihood ratio test = 54.06 on 13 df, p=5.897e-07. Wald test = 41.12 on 13 df, p=9.115e-05. Score (logrank) test = 47.1 on 13 df, p = 9.304e-06. Is it valid to say one model is more accurate than the other and if so, per your prior comment, I'm assuming you would recommend doing so based on the likelihood ratio? – BeginnersMindTruly Aug 28 '14 at 23:03