0

I'd like to use anova to investigate which variables are important to my Cox model outcome.

I tried using anova from {stats} and car but it seems neither can handle strata() terms, which I have in my model.

I can't use anova from rms because I have pspline() terms in my model.

An ideal solution might be anova.coxph from survival if I can change from sequential testing to non-sequential (e.g., like a two-model ANOVA or something like drop1()). However I don't know if this is possible.

I thought about calculating the relevant statistics manually (something like [this] answer suggests 1). This is in theory fine but will take a little while to run through all my 50 variables and doesn't seem very elegant.

Are there any better suggestions for how this can be done?

JED HK
  • 409
  • Do you have a separate model for each stratum, or a single set of coefficient estimates that applies to all strata? If the latter, can you get a result by applying vcov() to the output of your coxph object? That should be a square matrix of dimension equal to the number of coefficients that you estimated. For example, try dim(vcov(yourCOXPHmodel))). – EdM Sep 15 '22 at 20:36
  • I have only one model with the same coefficients for all, so the latter. vcov() works and indeed returns what you suggested (including the individual spline terms of those variables with psplines() too). – JED HK Sep 16 '22 at 08:21

1 Answers1

1

If you have a coefficient covariance matrix, you can perform Wald tests on the set of coefficients associated with each original predictor. Tests on multiple coefficients at once are sometimes called "chunk tests." The test is whether any of a set of coefficients differs from zero, taking the covariances among their estimates into account.

That's how the anova() function in Frank Harrell's rms package works. I believe that's also how the Anova() function in the car package works if you specify type = "II" and test.statistic = "Wald" in the function call.

If there's a glitch arising from the strata in the model or some problem with naming of variables, you could do this yourself by identifying the indices of each set of associated coefficients (e.g.: all coefficients for your spline; a main coefficient and all associated nonlinear and interaction terms) and doing the calculations with a simple program. The fundamental part of anova() in rms is in the first few lines of the code (type rms:::anova.rms at the R command prompt):

ava <- function(idx) {
        chisq <- coef[idx] %*% solvet(cov[idx, idx], coef[idx], 
            tol = tol)
        c(chisq, length(idx))
    }

The idx values are the indices of the coefficients in question, the coef are the corresponding point estimates (in the original log-hazard scale), and cov is the coefficient covariance matrix. The solvet() function is a version of the standard solve() function, adapted in the Hmisc package to allow for adjusting the tolerance for singularity. The chisq is the chi-square statistic.

This is straightforward for an unpenalized model. Then the Wald test is on that chi-square statistic, based on length(idx) degrees of freedom. It also works with robust coefficient covariance matrices. Frank Harrell's rms package uses the difference between the chi-square statistics and the corresponding degrees of freedom (the expected value under the null hypothesis) as a convenient summary of how much each predictor (including nonlinear and interaction terms) contributes to the model.

You need to make some choices with a penalized model, for example if the model includes pspline() terms. As Therneau and Grambsch explain in Section 5.8, there are two approaches to estimating the coefficient covariance matrix, resulting in two separate reports of standard errors in the summary() of a coxph object, se(coef) and se2. The former is more conservative. The vcov() function on such a model returns the more conservative matrix from the var slot of the coxph object, which the survival package uses for significance tests. I think that the rms package effectively uses the var2 matrix, as proposed by Gray. Make a choice of which coefficient covariance matrix to use. You then need to use the effective degrees of freedom from the model corrected for the penalization, rather than the number of associated coefficients, in the chi-square test.

You can use the effective degrees of freedom for likelihood-ratio tests comparing nested models when such tests are appropriate. They are not appropriate when the model uses robust covariance matrices. Robust models use the same point estimates based on maximizing partial likelihood and then adjust the covariance matrix to account for things like clustering. Likelihood-based comparisons thus wouldn't take that extra uncertainty into account. If your model involves robust covariance matrices then you need to use Wald tests instead. As Harrell says in Section 9.6 of his RMS textbook, such "adjustments for heterogeneity and correlated observations are not available for score and LR statistics."

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thanks a million Ed. So manual calculation is the way to go. I'm implemeting the code now but one problem I foresee: psplines() with df=4 gives me 4 degrees of freedom in the DF column (as expected) of the summary of the cox.zph output, however there are 12 individual coefficient for these variables, so using length(idx) will give me a different degrees of freedom. Why is the summary not counting all of the individual coefficients and instead returning the df used in pspline()? – JED HK Sep 16 '22 at 16:45
  • Also Ed, could I not achieve the same goal here by using anova(full_model, model_minus_one_variable) and doing a likelihood ratio test, and seeing if the difference is statistically significant? This seems like an adequate solution – JED HK Sep 16 '22 at 16:51
  • @JEDHK sorry, I forgot that you were working with penalized splines. Then the effective df is different from the number of coefficients. That affects likelihood-ratio tests, too. What I wrote applies strictly to unpenalized coefficients (e.g., the set of dummy coefficients associated with a multi-level categorical predictor, or the unpenalized coefficients of a restricted cubic spline). I'll edit to cover penalization later, when I get a chance. – EdM Sep 16 '22 at 18:51
  • I am truly grateful for your help Ed. Take your time. Finally though, I'd much appreciate clarification on the question of whether I can just run a full, a reduced model, and then calculate the appropriate statistics manually (difference in likelihood, difference in DF, chi squared) and calculate significance for my model variables that way. If that doesn't work, why not? – JED HK Sep 16 '22 at 18:58
  • It seems my df will either by that returned the cpx.zph or that from the variance-covariance matrix. Whichever one of these we agree that it is, then do I not have all the appropriate quantities (likelihood in full and reduced model, df in both) to calculate the relevant statistics (i.e. p value for each predictor)? – JED HK Sep 17 '22 at 05:46
  • @JEDHK if your model involves robust estimates then likelihood-based tests aren't appropriate; the likelihoods don't take that extra source of variability into account. Otherwise you can use the effective degrees of freedom reported for the coxph models (should be the same as those reported by cox.zph) for comparisons of nested models. I expanded the answer to explain issues related to penalization and robust estimates. – EdM Sep 17 '22 at 15:08
  • I redid analysis robust = FALSE after a brief chat with Professor Harrell (https://stats.stackexchange.com/questions/588757/robust-standard-errors-in-cox-survival-analysis?noredirect=1#comment1088576_588757). However, I'm still not sure what the best route here is. Without psplines() (i.e., in my initial model runs in the script) p-values hardly change with robust as true or false. However, the models further in the script with psplines() show drastic changes. I would love to be able to understand what this difference tells me. Maybe worth opening a new question? – JED HK Sep 18 '22 at 06:35
  • @JEDHK this comes down to whether you need robust standard errors (SE). They are needed in some form if you have multiple events per individual or other clustering. They are important if a Cox model is mis-specified; see Lin and Wei. Differences between standard model and robust SE might indicate mis-specification. If you use robust standard errors then you can't use likelihood-ratio tests; the chunk Wald tests then can evaluate what different predictors contribute overall to the model. – EdM Sep 18 '22 at 14:31
  • Yeah I did quite some reading on robust SE but it is still not very clear to me whether my situations demands it. As I say, without splines the results are basically unchanged between robust/non-robust; after adding psplines(), the results diverge a lot more... I read Lin and Wei a couple of times, it's a bit hard for me and I still saw nothing in there that suggested to me I should use robust SE (but that could me due to my own ignorance...) – JED HK Sep 19 '22 at 05:58
  • Oh, and additionally Ed, I am now attempting to do LR tests (as here for each variable in the model for which I have a coefficient estimate. Those I used as strata terms I will not do LRTs for (though they will remain in both the full and reduced models as terms). I think this is correct but confirmation would be nice. – JED HK Sep 19 '22 at 06:33
  • @JEDHK that’s OK if you are using regular standard errors. LRT are NOT OK if you are using robust standard errors. If you are reporting robust standard errors you must use Wald tests with the corresponding effective degrees of freedom. – EdM Sep 19 '22 at 10:58
  • Got it Ed! Thanks a lot. I might post another question about the comment above my previous one in this thread. Really curious to know what's going on there. – JED HK Sep 19 '22 at 11:05