2

I am reading Tutz & Schmid "Modeling Discrete Time-to-Event Data" (2016) chapter 4 Evaluation and Model Choice section 4.2 Residuals and Goodness-of-Fit. A goodness-of-fit statistic called deviance is defined as $$ D = 2 \sum_{i=1}^N n_i \sum_{t=1}^k p_{it} \log\left( \frac{p_{it}}{\hat\pi_{it}} \right) $$ where $\hat\pi_{it}$ is the estimated probability for a person belonging to group $i$ (of size $n_i$, among $N$ groups in total) to experience an event in the time period $t$ (from 1 to $k$), and $p_{it}$ is the corresponding observed proportion of persons in group $i$ that indeed experienced the event in the time period $t$.

A well-fitting model will produce values $\hat\pi_{it}$ that are close the the observed proportions $p_{it}$, though there will be discrepancies due to sampling variability inherent in $p_{it}$. This will yield $\frac{p_{it}}{\hat\pi_{it}}\approx 1$ and thus $\log\left( \frac{p_{it}}{\hat\pi_{it}} \right)\approx 0$ and thus $D\approx 0$. (Correct me if I am wrong.)

On the other hand, there is a later statement on the same page that asymptotically $D\sim\chi^2(N(k-1)-p)$. Now, I know $\chi^2(N(k-1)-p)$ can be obtained as a sum of squares of $N(k-1)-p$ independent N(0,1) random variables or as a sum of squares of a different number of dependent ones. However, I do not get the intuition why $p_{it} \log\left( \frac{p_{it}}{\hat\pi_{it}} \right)$ should behave like a square of N(0,1). Perhaps it should not? More generally, how is the asymptotic $\chi^2(N(k-1)-p)$ distribution obtained? (Intuition is welcome.)

Moreover, it seems to me that $\log\left( \frac{p_{it}}{\hat\pi_{it}} \right)$ and thus $p_{it} \log\left( \frac{p_{it}}{\hat\pi_{it}} \right)$ can be both lesser and greater than zero, and so $D$ could end up being negative. That is incompatible with a $\chi^2(N(k-1)-p)$ distribution. Am I getting this wrong?

enter image description here

Richard Hardy
  • 67,272

1 Answers1

2

This section of the Tutz and Schmid book assumes that there is no censoring, so this result is just the application of maximum-likelihood analysis to a multinomial distribution of events over the $k$ time periods.

Following Section 1.5.1 of the second edition of Agresti's Categorical Data Analysis, the multinomial log-likelihood function for a model with a set of probabilities $\pi$ given $n_j$ observations in $j$ categories (total $N$ observations) is:

$$ L(\pi)=\sum_j n_j \log(\pi_j)$$

where $\pi_j$ is the modeled probability of category $j$.

The displayed deviance formula follows the standard definition (e.g., Section 4.5 of Agresti): twice the difference in log-likelihood between a saturated model with the maximum likelihood:

$$\sum_j n_j \log(n_j/N)= \sum_j n_j \log(p_j),$$ where $p_j$ is the observed probability of category $j$, and that of the model in question:

$$\sum_j n_j \log(\pi_j).$$

As the data have maximum likelihood under the saturated model, the likelihood under any other model cannot exceed that. So the sum of terms contributing to the displayed deviance formula is necessarily non-negative, even if individual terms might be negative.

The asymptotic chi-square distribution comes from Wilks's theorem. Agresti provides a proof specific to the multinomial likelihood-ratio statistic in Section 14.3.4. This page and its links provide some intuition about Wilks's theorem. For degrees of freedom, I suppose you can think about this as $(k-1)$ independent time periods (no censoring, so the results for $k-1$ periods determine that for the $k^{th}$ period) for $N$ groups, less the $p$ fitted parameters.

In terms of using the deviance residuals in survival analysis (in models that incorporate censored data), Therneau and Grambsch note in Section 4.3:

The deviance residual was designed to improve on the martingale residual for revealing individual outliers, particularly in plotting applications. In practice it has not been as useful as anticipated.

I can't say that I've found them very useful, unlike the scaled Schoenfeld residuals for evaluating proportional hazards assumptions, or martingale residuals for evaluating functional forms.

EdM
  • 92,183
  • 10
  • 92
  • 267