Background
I'm teaching an intro stats class in our social / health sciences department and I'm finding myself tripped up on something I'd always taken for granted: namely, the claim that survival analysis methods (from Kaplan-Meier to Cox models) "account for" censored data, and that this is one of these methods' central advantages over other approaches.
(Relevant background: in my research I use, mostly clumsily, some applied statistical and quantitative methods, and I have a good grasp of and intuition for some of the basics. So I'm an okay generalist but I'm not a statistician nor even a statistics grad student by any means.)
The Problem
As I'm preparing some PowerPoint slides for these students, though, I'm realizing that while I have a pretty good grasp on what censoring is in survival methods, I have no idea how something like a Cox model "accounts" for it. I'd never thought about it, and instead assumed that this is just what happened. (It's possible that a professor walked us through this earlier in my education, but it's also possible that I used to struggle in 8am stats lectures.)
Censoring, as you all already know, is the state in which we have some information about individual survival time, but we don't know the survival time exactly (Kleinbaum & Klein, p. 5; I've always liked this plain English explanation). In my experience, sometimes when stats people are trying to sell someone on survival methods, they'll say things like "logistic regression doesn't account for censoring, but Cox regression does!"
There are some great posts on CV about censoring; this is maybe my favorite example. In it, user @Tim gives a terrific explanation of censoring:
Intuitive example of censoring is that you ask your respondents about their age, but record it only up to some value and all the ages above this value, say 60 years, are recorded as "60+". This leads to having precise information for non-censored values and no information about censored values.
I think this is brilliant (in fact I plan on borrowing liberally from it during the lecture, with credit given of course). But it doesn't get really get into how survival analyses actually deal with this, and whether it's a selling point for survival methods ("our methods can do this and yours do not") or just something that pops up when you try and ask survival-type questions ("how long will people live, on average, after being given treatment X?").
The Questions
Is it true, strictly speaking, that something like a Kaplan-Meier estimator or a Cox proportional hazards model is "accounting" for censoring?
If so, how is it doing that?
If a survival model indeed accounts for censoring, is this a "feature" of survival methods over others, or a "bug", an inevitable artifact of the sorts of questions one uses survival methods to answer?
My guesses
Well, not guesses, maybe more like very unclear intuitions I'm not too confident about:
Why is censoring a problem? I'm thinking that, if ignored, censoring is a huge potential source of bias in making survival estimates: if you don't know what happened after Mr. Smith dropped out of your study (i.e. was lost to follow-up, i.e. was censored) your estimates may be off in one direction or the other. Maybe he lived a long, long time -- or maybe he died the next day. If this is happening to lots of people in the same way, your estimates may be really, really off.
So maybe what survival models have found a way to do is keep in the analysis everyone who contributed survival time, regardless of whether we know about their outcome status, while other methods would simply drop all of those people's data as missing.
Am I way, way off here?