1

I am using coxph in the survival library to create Andersen-Gill models. I am interested in getting interval-specific survival rates for my data. For instance, I have 26 time intervals and would like to know the probability of survival within each interval. My basic code on a null model is as follows:

coxph <- coxph(Surv(start, stop, death.time)~1, data=mydata)
plot(survfit(coxph), ylim=c(0,1), xlab="Weeks", ylab="% Mortality")

enter image description here

This model is obviously just a null model for the sake of this example, but is there any way to get interval-specific survival rates, i.e., weekly survival or even just an overall period (26) survival probability?

  • Are you asking how such rates can be calculated for a specific interval in general (eg, mathematically), or are you asking for R code? Note that the latter is off topic here. – gung - Reinstate Monica Jan 13 '16 at 16:38
  • 1
    I'm asking how such rates can be calculated for the specific time intervals in general. Thanks. – user3634037 Jan 13 '16 at 17:01
  • OK, thanks. It may help some potential answerers if they have access to the actual data (eg for demonstration purposes), although some don't use R. Can you provide it? – gung - Reinstate Monica Jan 13 '16 at 17:09

2 Answers2

1
my_cox_model <- coxph(....)   
sf <- survfit(my_cox_model)
summary(sf)

This should give you the actual values of the survival. For more things you can do with that, check out ?survfit.coxph.

P.S. It is pretty bad practice to name your model coxph, since that is already the name of the function coxph(...). I would name it differently.

Theodor
  • 1,831
1

An "Andersen-Gill model" handles multiple recurrent events of the same type with a single baseline hazard function. If your outcome is "death" then it's not really an Andersen-Gill model, as each individual only has one event. See Section 3.2 of the main R survival vignette.

What you have seems instead to be interval-censored panel data, with death as the outcome. You know that death occurred sometime within the interval, not exactly when. The coding with (startTime, stopTime, status) might be the same, but the analysis is different. As Section 2.8.1 of the survival vignette says: "Such data requires further assumptions about the transition process in order to model the outcomes and has a more complex likelihood." The vignette recommends the msm package as an alternative.

With your data having only about two dozen distinct time intervals and your stated interest in interval-specific death probabilities, a discrete-time survival analysis might be simpler. That's essentially a binomial regression model (logistic regression, or with a complementary log-log link if you have proportional hazards) that treats time as a fixed effect. Then you can just get predictions for any particular time period directly by specifying the time and covariate values of interest.

EdM
  • 92,183
  • 10
  • 92
  • 267