1

I am reading Tutz & Schmid "Modeling Discrete Time-to-Event Data" (2016) chapter 4 Evaluation and Model Choice section 4.2 Residuals and Goodness-of-Fit. I got stuck on p. 78:

enter image description here

The first hurdle is the third paragraph (In discrete survival one considers...). What is a response? How come a single person $i$ generates a $k$-long random vector? (I suspect $k=q$.) It would make some sense if we had encoded the survival time $T=t$ by a $t$-long binary vector $(0,\dots,0,1)$ where $0$ denotes survival and $1$ denotes failure, but the authors use a different notation $(T_{i1},\dots,T_{ik})$ for such an encoding elsewhere in the book. Also, then $n_i$ would only be equal to 1 if the individual failed in the first time period, but this is not what is implied in the last sentence of the paragraph.

The second hurdle is Moreover, let $p_{it}$ denote the proportion of observations in period $t$ in subpopulation $i$. What is subpopulation $i$? What is the denominator of the proportion? Initially I would guess $p_{it}$ were the estimand of $\hat\pi_{it}$ (and thus a population quantity), but that would make $\chi_P^2$ and $r_{P,i}^2$ unobservable, while I think they should be observable. So then $p_{it}$ should be an observed quantity.

I am really confused.

Richard Hardy
  • 67,272

1 Answers1

1

In the paragraph starting with "In discrete survival one considers...," the $i$ values don't necessarily index individuals.* They index groups of individuals (also called "subpopulations") all of whom have the same set of covariate values; group $i$ has $n_i$ individuals sharing a set of covariate values $x_i$ (evidently constant over time in this situation) and there are $N$ such groups in total.

What they call "responses" are perhaps more commonly called "events." So, of the $n_i$ individuals sharing covariate values $x_i$, $T_{is}$ is the number having the event during time period $s$.

The probability model is fit by combining information across all groups and their associated covariate values. A $p_{it}$ value is the observed fraction of individuals in group/subpopulation $i$ who had the event during time period $t$, while the corresponding $\hat\pi_{it}$ value is the proportion estimated by the model.


*I think that the first sentence of Section 4.2.1 uses different symbol meanings than in what follows. I share your sense that $q$ in the first sentence is equivalent to $k$ later on. Furthermore, $T$ in that sentence represents time (versus event counts later on) and $i$ in that sentence is the index for times (versus for "groups/subpopulations" later on).

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you, this seems logical. I think the editor(s) just skipped this page when preparing the book for publication; it really stands out in terms of lack of clarity and inconsistent notation. – Richard Hardy Aug 10 '22 at 06:14
  • I have posted several more questions on related topics; perhaps you could take a look: this, this and (less pressing) this. Thank you! – Richard Hardy Aug 15 '22 at 07:22