Deriving the Kaplan-Meier estimator with right-censoring

Question

Let $\lambda_j$ be the hazard function for time $t_j$, where $j\in\{1, 2, \ldots, J\}$, and let $d_j, c_j$, and $n_j$ represent the number of "deaths", the number censored, and the number at risk at time $t_j$ respectively. Essentially, this is the notation used in http://myweb.uiowa.edu/pbreheny/7210/f15/notes/9-8.pdf.

Then, using the maximum likelihood approach, we have

$L(\{\lambda_j\}) = \prod^{J}_{j=1}\{\lambda_j^{d_j}\left(\prod^{j-1}_{k=1}(1-\lambda_k)^{d_j}\right)\prod^{j}_{k=1}(1-\lambda_k)^{c_j}\}$

which then is said to be equal to

$L(\{\lambda_j\}) = \prod^{J}_{j=1}\left(\lambda_j^{d_j}(1-\lambda_j)^{n_j-d_j}\right)$

which I simply cannot wrap my head around. Where did the $(1-\lambda_1)$, $(1-\lambda_2), \ldots, (1-\lambda_j)$ terms go to? By some magic these terms from the product over k have all been combined into this $(1-\lambda_j)^{n_j-d_j}$ term. The derivation I have linked to here does not help to explain how this step works, nor do any of the other derivations I have looked at. I hope this question and the notation I am using are clear enough to answer.

EDIT:

I figured it out. Starting from the top equation we have:

$L(\{\lambda_j\}) = \prod_{j=1}^{J}\{\lambda_j^{d_j}(1-\lambda_j)^{-d_j}\}*\prod_{j=1}^{J}\prod_{k=1}^{j}(1-\lambda_k)^{d_j+c_j}$

The key to breaking down the double product is to reverse the order of the products. This is like switching the order of integration; in fact, this double product is akin to $\int^{\infty}_{0}\int^{y}_{0}dxdy$, which is equivalent to $\int^{\infty}_{0}\int^{\infty}_{x}dydx$. From this, it is not hard to see that

$\prod_{j=1}^{J}\prod_{k=1}^{j}(1-\lambda_k)^{d_j+c_j} = \prod_{k=1}^{J}\prod_{j=k}^{J}(1-\lambda_k)^{d_j+c_j}=\prod_{k=1}^{J}(1-\lambda_k)^{\sum^{J}_{j=k}(d_j+c_j)}$

But $\sum^{J}_{j=k}(d_j+c_j) = n_k$, which is the math way of saying "everyone will either 'die' or be 'censored' by the end of the study."

So the upshot is that $\prod_{j=1}^{J}\prod_{k=1}^{j}(1-\lambda_k)^{d_j+c_j} = \prod_{k=1}^{J}(1-\lambda_k)^{n_k}$

and finally we get

$L(\{\lambda_j\}) = \prod^{J}_{j=1}\left(\lambda_j^{d_j}(1-\lambda_j)^{n_j-d_j}\right)$

since k is a dummy index, we can just call it j and be done with it.

Deriving the Kaplan-Meier estimator with right-censoring

0 Answers0