3

I am trying to analyze my data using survival analysis in R and I am curious about how to identify my data.

My research is an 11-month longitudinal study, and me and my colleagues collected data for almost every 2 months (wave 1-7). Only wave 5 and 6 were 1 month apart, but all others were 2 months apart.

As I understood, if the data are not collected continuously, the researcher can use survival analysis for interval-censored data, and if some data have the same time intervals, then the researcher can use survival analysis for discrete-time data. I also heard that for discrete-time data, all time intervals must be equal.

My question is,

  1. Even if my data are not collected at the same interval, can I still use survival analysis for discrete-time data? Or is it more appropriate to use interval-censored survival analysis?

  2. If I can use the method for discrete-time data, then is it okay to set the time interval to 1 month? Or are 2-month intervals more appropriate?

Han
  • 31
  • 2
    It is not true that all intervals must have the same length for discrete time event history analysis (DTEHA). For example, sometimes DTEHA is used to answer whether and when an event occurs in terms of known stages in some process, and different stages may take different calendar time. – Alexis Dec 20 '23 at 17:31

1 Answers1

4

If you are assuming proportional hazards, there isn't necessarily a difference in this situation between a discrete-time or interval-censored analysis. See this page and its links.

A discrete-time survival model is a set of binomial regressions. If you use a complementary log-log link for the binomial regression instead of the usual default logit link, you have what's termed a "grouped proportional hazards model" and will get corresponding estimates of regression coefficients for covariate associations with outcome.

The different durations of the intervals also don't matter in the proportional hazards context. Recall that a continuous-time Cox model doesn't directly evaluate the event times, only their ordering in time. Further calculations can then estimate the cumulative baseline hazard as a function of sequential event number, which then gets mapped back to the corresponding event times.

Similarly, the fixed-time coefficient for each time interval in a discrete-time "grouped proportional hazards model" is related to the difference in baseline cumulative hazard between the end and the beginning of the interval. There's no requirement that each time interval has the same duration. Similarly to a continuous-time Cox model, after you have fit such a discrete-time model you can reconstruct the baseline cumulative hazard from the model coefficients.

If you are trying to fit a parametric survival model, or a model that doesn't assume proportional hazards, or use discrete-time survival software that doesn't use a complementary log-log link, then the above simplifications won't hold. But if you would be fitting a Cox model if you had continuous-time data, the discrete-time survival discussed above will be fine.

EdM
  • 92,183
  • 10
  • 92
  • 267