4

Let us consider the case of survival analysis with one event. Let $X$ represent a set of covariates about each unit. Let $T_E$ be the (latent) event time of the unit, let $T_C$ be the (latent) censoring time of the unit. Let $T = min(T_E, T_C)$ be the observed last time of observation, and let $E = \textbf{1}[T_E < T_C]$ be the event indicator variable.

Survival analysis often makes the assumption of "non-informative censoring", but I'm struggling to find a formal statistical definition for the term. I understand that the term "independent censoring" can be taken to mean:

$$T_E \perp T_C$$

And that the term "conditionally independent censoring" can be taken to mean:

$$T_E \perp T_C | X$$

But what - if any - is the definition of "informative censoring" and "non-informative censoring" within the context of this setup?

As one additional note - in this thread, I came across the definition that "non-informative censoring occurs if the distribution of survival times ($T_E$) provides no information about the distribution of censorship times ($T_C$), and vice versa." But how would this differ from the case of independent censoring that I discussed above? Isn't this just a fancy re-stating of statistical independence?

Thank you so much in advance for the help!!

1 Answers1

3

Lagakos provided a formal definition of "noninformative censoring" in Biometrics 35: 139-156, 1979. It's the most general case in which the standard formulation of (partial) likelihood for survival models is correct under right censoring. "Non-prognostic" censoring, when "a censored observation at time u provides only the information that true survival time exceeds u" is a proper subset of noninformative censoring, and truly independent censoring is in turn a proper subset of "non-prognostic" censoring.

Lagakos notes (page 144):

Noninformative censoring... is the least easily interpreted condition of the three in terms of the physical process, yet it is the most general one mathematically. Accordingly, its primary role is likely to be in those situations where one cannot confidently indentify the censoring mechanism as being independent or nonprognostic, and hence must resort to statistical tests to assess the effects of the censoring mechanism.

The more general "noninformative" censoring is based on local-in-time distributions of event and censoring probabilities, and was called a "constant-sum model" in an earlier paper by Williams and Lagakos, Biometrika 64: 215-224, 1977. In Section 2 they consider a small neighborhood of time $N_{u}$ around observation time $u$, and define two functions to describe observed event and censoring probabilities.

For events, where $a =1$ represents an observed event and $T$ is the actual event time, $a(u) = \text{pr}(a = 1| T \in N_u)$, the probability that the event is observed given that it happens during that time neighborhood.

For censoring, they define $dB(u)=\text{pr}(C_u | T \ge u)$, "the probability that a patient who lives at least u units is censored at time u," with $C_u$ the probability of being censored during the neighborhood of time $N_u$. $B(u)$ is the integral from time = 0, $B(u)=\int_0^u dB(t)$.

Noninformative censoring then is equivalent to the class of "constant sum" models, for which

$$a(u)+B(u)=1 $$

almost everywhere with respect to the underlying survival-time distribution. In Section 3, Williams and Lagakos show an example of a joint, non-independent distribution of event and censoring times that is nevertheless noninformative (constant-sum).

EdM
  • 92,183
  • 10
  • 92
  • 267