I was wondering if there are any sufficient conditions under which I can remove censored data from the model and work just with failure time? (Without creating bias and ruining my model ofc.) Some independence conditions, etc.
-
Welcome to Cross Validated! With so many well vetted tools for handling censored values, why would you want to remove censored failure times at all? Please add that information by editing the question, as comments are easy to overlook and can be deleted. – EdM Aug 30 '22 at 13:20
1 Answers
The review by Leung et al. is still a good reference for the issues involved in censoring. With respect to your question, they discuss two possibilities.
The first possibility is your suggestion to restrict to uncensored survival times, called "complete-data analysis." Leung et al note substantial disadvantages to the approach (page 90):
(a) Loss of efficiency: The loss in sample size can be considerable since it is not unusual, especially in medical or epidemiological studies, that 50% or more observations are censored. (b) Estimation bias: Inferences based on analyzing the uncensored observations only may be biased. It is a common misconception that one need not make any assumptions about the censoring mechanism when performing a complete-data analysis. In reality, such an analysis requires a strong assumption regarding the censoring mechanism: As in the incomplete data situations, complete-data analysis produces unbiased estimates only if the missing (censored observations) are missing (censored) completely at random.
The missing completely at random (MCAR) assumption is a very strict and typically untenable assumption, discussed extensively in the broader field of analysis with missing data.
A second possibility, to handle right or interval censoring, is to "[analyze] the incidence of occurrence versus nonoccurrence of the event within a fixed period of time and [disregard] the survival times." Leung et al call that "analysis based on dichotomized data." That throws away the time-to-event information in survival analysis, doesn't distinguish causes of censoring, and doesn't allow for incorporation of time-varying covariates. They provide the following case where this might be considered (page 91):
the approach of analyzing dichotomized data may be acceptable when the risk of failure is low, the risk periods (survival times) are long, and the covariates are associated with preventing the event rather than with prolonging the survival time. Such situations are common in many epidemiologic studies
With the tools widely available for handling censored survival times, there is no advantage and many risks in removing cases with censored survival times.
S. van Buuren. "Flexible Imputation of Missing Data," Second Edition. Chapman and Hall, 2018.
- 92,183
- 10
- 92
- 267