Taking the veteran dataset of a two-treatment, randomized trial for lung cancer in the R package survival as an example, where
timeis the survival time in daysstatusis the censoring status (0 for surviving and 1 for dead)trtis the treatment type (1 or 2)celltype: 1=squamous, 2=small cell, 3=adeno, 4=largeage: age in years
(There are more covariates but for simplicity I only included 3)
Adding in the above covariates/factors, the Cox regression can be run as follows:
library(survival)
cox <- coxph(Surv(time, status) ~ trt + celltype + age, data = veteran)
summary(cox)
Call:
coxph(formula = Surv(time, status) ~ trt + celltype + age, data = veteran)
n= 137, number of events= 128
coef exp(coef) se(coef) z Pr(>|z|)
trt 0.179011 1.196034 0.201404 0.889 0.374
celltypesmallcell 1.080310 2.945592 0.274647 3.933 8.37e-05 ***
celltypeadeno 1.170470 3.223506 0.294727 3.971 7.15e-05 ***
celltypelarge 0.292624 1.339939 0.285504 1.025 0.305
age 0.004097 1.004106 0.009581 0.428 0.669
---
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
trt 1.196 0.8361 0.8059 1.775
celltypesmallcell 2.946 0.3395 1.7195 5.046
celltypeadeno 3.224 0.3102 1.8091 5.744
celltypelarge 1.340 0.7463 0.7657 2.345
age 1.004 0.9959 0.9854 1.023
Concordance= 0.619 (se = 0.028 )
Likelihood ratio test= 26.04 on 5 df, p=9e-05
Wald test = 25.01 on 5 df, p=1e-04
Score (logrank) test = 26.51 on 5 df, p=7e-05
Now, these coefficients allow me to compute the hazard ratio between any two groups of patients, each with a set of trt, celltype, and age. For examples, if
- group A has
trt= 1,celltype= squamous, and age = 49, and - group B has
trt= 2,celltype= smallcell, and age = 50,
then we can say that at any time point, the proportion of subjects from group B who have died from group B should be $1.196\times2.946\times1.004=3.538$ times of that from group A. But this does not allow us to tell the proportion of survival at any time point for either group A or B - we only know the ratio.
In broader term, is it possible to tell at any given time point, for a group of patients with known trt, celltype, and age, the proportion of survival? If all terms tested are factors, we can just subset the target population from the dataset and generate the Kaplan-Meier curve, but then age is present in the model. Maybe a relevant question is: what is the baseline hazard function $h_0(t)$ in this case?