Robust standard errors in Cox survival analysis

Question

The final step of running a Cox model I've been working on involves performing log-likelihood ratio tests to check the significance of each predictor to the model. I'd like to do this using anova(full_model, model_minus_one_variable) for each variable.

However, I am unable to do this because I have used robust = TRUE in the cox.zph function (i.e., robust standard errors). My decision to do so was basically entirely driven by this post (as linked in the Python lifelines package) since it suggested I'd be better protected from PH violations.

My questions:

Would it be appropriate to run my final model with robust = FALSE just for the purpose of finding out which variables are significant or is this bad statistics?
If the answer to the above question is no, it seems a possible solution could be to manually calculate the significance of variables in the model by log-likelihood ratio tests (instead of using anova) as here. Is this an appropriate alternative?
Was my decision at the beginning of the modeling process to use robust SE justified? My results - namely coefficients and p-values - for continuous variables with pspline (all of my continuous variables) are very different with and without robust SE. This appears not to matter too much because 1) the HR is still the same and 2) log-likelihood is also comparable. Since I intend to investigate significant variables based on likelihood, I think it won't matter. However, it's probably obvious from this post that I'm confused and doubt my own judgement.

Additional info that may be helpful to the questions: N.o. observations = 100k with 1655 events, main objective is to investigate a continuous treatment on mortality, with 48 other covariates. psplines() used on all continuous variables due to evidence of nonlinearity and to account for non-proportional hazards. Strata terms added on 8 categorical vars due to nonPH. Total df = 138.

What do you mean by "check the significance"? Statistical significance means almost nothing, and your phrasing also implies that you might do something invalid such as removing "insignificant" predictors. Also, if the model doesn't fit, robust standard errors may be give you the right SE estimates but for the wrong parameters. — Frank Harrell, Sep 14 '22 at 14:12
Thanks for the help Professor Harrell. By "significance" I mean, statistically speaking, whether the inclusion of a variable in my Cox model makes enough of a difference to the outcome that it is relevant. I am aware and in agreement that removing p-value selection is erroneous, so no worries there. My reason for wanting to know is purely for the purpose of reporting results - as you know, R reports the p-values for pairwise differences within a factor, but not for the factor as a whole, which is what I want. The model seems to fit well, the results are reasonable and logical — JED HK, Sep 14 '22 at 15:07
Then maybe no reason to use robust SEs. But don't present single d.f. tests when categorical predictors have > 2 levels; instead show "chunk" (composite) tests. This is automatic with the R rms package anova function that works with cph objects (cph is a front-end to survival::coxph). — Frank Harrell, Sep 14 '22 at 16:00
Ideally I would use anova from rms for this purpose, but because I've used psplines() I can‘t. I initially tried using rcs() but had less PH issues with psplines() so went with that one in the end. The incompatibilities between different methods sometimes makes things tricky. Going to redo the analysis now without robust SE and see how things change. — JED HK, Sep 15 '22 at 07:00
For assessing PH for splines don't look at assessments of individual coefficients but rather for the overall function. cox.zph has an option that will show "terms". Overall assessment for psplines and rcs should be about the same. — Frank Harrell, Sep 15 '22 at 11:13
Thanks Professor, I've already done that (using the default terms = TRUE). I still saw quite some differences between PH in rcs() versus psplines() (more PH violations with the former). psplines() seems to be working well - I'm going to continue with that. Also maybe of interest, I redid analysis with robust=FALSE, the main difference I see so far is that pairwise differences in several categorical covars are now no longer significant. Not a big deal (this is not my primary outcome of interest) but I am also not sure how to interpret this finding — JED HK, Sep 15 '22 at 11:49
Don't compare "significances" unless the p-values differ more than say 0.2. But the PH assessment should not vary as much as you are finding, if the degrees of freedom of the rcs spline matches that of the psplines spline. — Frank Harrell, Sep 16 '22 at 12:20
Great, using non-robust SE makes my life easier anyway. Final piece of the puzzle I need to now is to know how to do this ANOVA - do you see anything wrong with doing it manually as I suggest in the post? That might be my only option here. As for the psplines() and rcs() discrepancy, I guessed it was something to do with knot placement (which I know arguably shouldn't have a large influence but I dont know how else to explain it). I used df=3 on rcs() and the default (4) on psplines() originally, but have since cut down to df=2 on most vars to save on df.... — JED HK, Sep 16 '22 at 13:28
I should maybe emphasise that the results weren't wildly different, but definitely I had less sever and also less total PH violations with penalised splines vs rsc() — JED HK, Sep 16 '22 at 13:39

Robust standard errors in Cox survival analysis

0 Answers0

Linked