After fitting a coxmodel it is possible to make predictions and retrieve the relative risk of new data. What I don't understand is how the relative risk is computed for an individual and what is it relative to (i.e. the average of the population)? Any recommendations for resources to help understand (I not very advanced in survival analysis so the simpler the better)?
1 Answers
Edit: the following description applies to survival versions 3.2-8 and below. Starting with version 3.2-9, the default behavior of predict.coxph() changes with respect to treating 0/1 (dummy indicator) variables. See NEWS.
predict.coxph() computes the hazard ratio relative to the sample average for all $p$ predictor variables. Factors are converted to dummy predictors as usual whose average can be calculated. Recall that the Cox PH model is a linear model for the log-hazard $\ln h(t)$:
$$ \ln h(t) = \ln h_{0}(t) + \beta_{1} X_{1} + \dots + \beta_{p} X_{p} = \ln h_{0}(t) + \bf{X} \bf{\beta} $$
Where $h_{0}(t)$ is the unspecified baseline hazard. Equivalently, the hazard $h(t)$ is modeled as $h(t) = h_{0}(t) \cdot e^{\beta_{1} X_{1} + \dots + \beta_{p} X_{p}} = h_{0}(t) \cdot e^{\bf{X} \bf{\beta}}$. The hazard ratio between two persons $i$ and $i'$ with predictor values $\bf{X}_{i}$ and $\bf{X}_{i'}$ is thus independent of the baseline hazard and independent of time $t$:
$$ \frac{h_{i}(t)}{h_{i'}(t)} = \frac{h_{0}(t) \cdot e^{\bf{X}_{i} \bf{\beta}}}{h_{0}(t) \cdot e^{\bf{X}_{i'} \bf{\beta}}} = \frac{e^{\bf{X}_{i} \bf{\beta}}}{e^{\bf{X}_{i'} \bf{\beta}}} $$
For the estimated hazard ratio between persons $i$ and $i'$, we just plug in the coefficient estimates $b_{1}, \ldots, b_{p}$ for the $\beta_{1}, \ldots, \beta_{p}$, giving $e^{\bf{X}_{i} \bf{b}}$ and $e^{\bf{X}_{i'} \bf{b}}$.
As an example in R, I use the data from John Fox' appendix on the Cox-PH model which provides a very nice introductory text. First, we fetch the data and build a simple Cox-PH model for the time-to-arrest of released prisoners (fin: factor - received financial aid with dummy coding "no" -> 0, "yes" -> 1, age: age at the time of release, prio: number of prior convictions):
> URL <- "https://socialsciences.mcmaster.ca/jfox/Books/Companion/data/Rossi.txt"
> Rossi <- read.table(URL, header=TRUE) # our data
> Rossi[1:3, c("week", "arrest", "fin", "age", "prio")] # looks like this
week arrest fin age prio
1 20 1 no 27 3
2 17 1 no 18 8
3 25 1 no 19 13
> library(survival) # for coxph()
> fitCPH <- coxph(Surv(week, arrest) ~ fin + age + prio, data=Rossi) # Cox-PH model
> (coefCPH <- coef(fitCPH)) # estimated coefficients
finyes age prio
-0.34695446 -0.06710533 0.09689320
Now we plug in the sample averages for our predictors into the $e^{\bf{X} \bf{b}}$ formula:
meanFin <- mean(as.numeric(Rossi$fin) - 1) # average of financial aid dummy
meanAge <- mean(Rossi$age) # average age
meanPrio <- mean(Rossi$prio) # average number of prior convictions
rMean <- exp(coefCPH["finyes"]*meanFin # e^Xb
+ coefCPH["age"] *meanAge
+ coefCPH["prio"] *meanPrio)
Now we plug in the predictor values of the first 4 persons into the $e^{\bf{X} \bf{b}}$ formula.
r1234 <- exp(coefCPH["finyes"]*(as.numeric(Rossi[1:4, "fin"])-1)
+ coefCPH["age"] *Rossi[1:4, "age"]
+ coefCPH["prio"] *Rossi[1:4, "prio"])
Now calculate the relative risk for the first 4 persons against the sample average and compare to the output from predict.coxph().
> r1234 / rMean
[1] 1.0139038 3.0108488 4.5703176 0.7722002
> relRisk <- predict(fitCPH, Rossi, type="risk") # relative risk
> relRisk[1:4]
1 2 3 4
1.0139038 3.0108488 4.5703176 0.7722002
If you have a stratified model, the comparison in predict.coxph() is against the strata-averages, this can be controlled via the reference option that is explained in the help page.
- 145,122
- 12,009
meanFin <- mean(as.numeric(Rossi$fin) - 1)does not make much sense, sincefinis categorical. Don't you need tomodeFin <- get_Mode(Rossi$fin)in this case? – Zhubarb May 30 '14 at 14:53finis binary, so the numerical representation of the factor just has the values 1 and 2. Subtracting 1 gives us the dummy-coded variable with values 0 and 1 that also appears in the design matrix. Note that this won't work for factors with more than 2 levels. It is certainly debatable whether averaging dummy variables makes sense, but that's whatpredict.coxph()does. – caracal Jun 02 '14 at 07:03When using a linear model, I can use the formula s.e=unname(rowSums((Xp V) * Xp)), where v=covariance matrix, Xp = covariates to predict for, to get the same standard errors given by predict function. When using the same formula for s.e of a fitted value from a cox model, they do not match up.
I assume this is because predict function gives the standard error of the ratio which you highlight above.
– AP30 Feb 14 '18 at 17:56survival::predict.coxph()at https://github.com/cran/survival/blob/master/R/predict.coxph.R – caracal Feb 15 '18 at 09:39