I am trying to derive the score test for a logistic regression and I read on Wikipedia that the score test for binomial data (which is used in logistic regression) is the same as the Pearson $\chi^2$-test. Why is this the case?
1 Answers
For a helpful theoretical discussion and a practical illustration (in R) that the score test and the chi-squared test are equivalent in the case of logistic regression with one categorical predictor see:
Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR?
Let's start with notation. For each of $n$ observations, we have a categorical predictor $x_i\in\{1,\ldots,J\}$ and a binary outcome $y_i\in\{0,1\}$. We can tally the observations into the number of successes ($1$s) and failures ($0$s) in the $j$th category, which we denote $k_j$ and $(n_j-k_j)$, respectively. The count $k_j$ is a binomial random variable given $n_j$ trials with (unknown) probability of success $p_j$.
First we derive the statistic, $X^2$, of the Pearson's chi-squared test.
We arrange the counts into a contingency table with two rows (for successes and failures) and $J$ columns (for each category). Under the null hypothesis that outcome and predictor are independent, we expect $n\hat{p}\hat{q}_j$ successes and $n(1-\hat{p})(1-\hat{q}_j)$ failures in category $j$. Here $\hat{p} = \sum_jk_j/n$ and $\hat{q}_j = n_j/n$. $$ \begin{aligned} X^2 &= \sum_j \frac{\left(k_j - n\hat{p}\hat{q}_j\right)^2}{n\hat{p}\hat{q}_j} + \frac{\left((n_j-k_j)-n(1-\hat{p})(1-\hat{q}_j)\right)^2}{n(1-\hat{p})(1-\hat{q}_j)} \\ &= \sum_j \frac{\left(k_j-n_j\hat{p}\right)^2(1-\hat{p})}{n_j\hat{p}(1-\hat{p})} + \frac{\left(k_j-n_j\hat{p}\right)^2\hat{p}}{n_j\hat{p}(1-\hat{p})} = \sum_j \frac{\left(k_j-n_j\hat{p}\right)^2}{n_j\hat{p}(1-\hat{p})} \end{aligned} $$
Next we derive the score test statistics, $S$, in two steps. First the log likelihood, its first and second derivatives. $$ \begin{aligned} \log L(p) &= \sum_i y_{i}\log p_{j[i]} + (1-y_i)\log\left(1 - p_{j[i]}\right) \\ &= \sum_j k_{j}\log p_j + \left(n_{j}-k_j\right)\log\left(1 - p_j\right) \end{aligned} $$ where $j[i]$ denotes the category $j$ of observation $i$.
The first derivative of the log likelihood is a vector. $$ \begin{aligned} \frac{\partial}{\partial p_j}\log L(p) &= \operatorname{vec} \left[\frac{k_j}{p_j} - \frac{n_j-k_j}{1-p_j}\right] \end{aligned} $$ where $\operatorname{vec}[v_i]$ denotes a column vector with elements $v_i$.
The second derivative is a diagonal matrix. The off-diagonal entries are 0 because the $j$th element of the first derivative is a function of the probability $p_j$ only; it doesn't contain the other probabilities $p_{j^\prime}$, $j^\prime \neq j$. $$ \begin{aligned} \frac{\partial^2}{\partial p_j\partial p^\prime} \log L(p) &= \operatorname{diag} \left[-\frac{k_j}{p_j^2} - \frac{n_j-k_j}{(1-p_j)^2}\right] \end{aligned} $$ where $\operatorname{diag}[d_{ii}]$ denotes a diagonal matrix with diagonal elements $d_{ii}$.
Now we can derive the score function $U(p)$ and the Fisher information matrix $I(p)$. $$ \begin{aligned} U(p) &= \frac{\partial}{\partial p_j}\log L(p) = \operatorname{vec} \left[\frac{k_j - n_jp_j}{p_j(1-p_j)} \right] \\ I(p) &= -\operatorname{E} \left\{\frac{\partial^2}{\partial p_j\partial p^\prime} \log L(p) \right\} = \operatorname{diag} \left[\frac{\operatorname{E}\{k_j\}}{p_j^2} + \frac{\operatorname{E}\{n_j-k_j\}}{(1-p_j)^2}\right] \\ &= \operatorname{diag} \left[\frac{n_jp_j}{p_j^2} + \frac{n_j(1-p_j)}{(1-p_j)^2}\right] = \operatorname{diag}\left[\frac{n_j}{p_j(1-p_j)}\right] \end{aligned} $$
And finally the score statistics at the MLE $\hat{p}=\sum_jk_j/n$ (the overall proportion of successes). We use the fact that the inverse $D^{-1}$ of a diagonal matrix $D$ with diagonal elements $d_j$ is another diagonal matrix with elements $d_j^{-1}$. $$ \begin{aligned} U(\hat{p}) &= \operatorname{vec} \left[\frac{k_j - n_j\hat{p}}{\hat{p}(1-\hat{p})} \right] \\ I(\hat{p}) &= \operatorname{diag}\left[\frac{n_j}{\hat{p}(1-\hat{p})}\right] \\ S(\hat{p}) &= U(\hat{p})^\top I(\hat{p})^{-1} U(\hat{p}) = \sum_j \frac{\left(k_j-n_j\hat{p}\right)^2}{n_j\hat{p}(1-\hat{p})} \end{aligned} $$
We've derived the score test statistic $S$ for logistic regression with a categorical predictor and it's equivalent to the Pearson's chi-squared statistic $X^2$.
- 9,805
-
Thank you very much for a nice and thorough answer :) – hmsand Dec 05 '22 at 16:49