Your calculation doesn't reproduce the p-value correctly because by default logistf uses Firth's penalized maximum likelihood to fit the logistic regression and the penalty is a function of the predictors. In effect, the full model and the reduced model are penalized differently.
The penalized likelihood is:
$$
\begin{aligned}
\log L^*(\beta) = \log L(\beta) + \frac{1}{2}\log\det I(\beta)
\end{aligned}
$$
where $I(\beta)$ is the Fisher information matrix. (The second term — the penalty — is the term which @Mark points out in a comment.)
The coefficients $\beta$ are estimated iteratively and, given the current value for the probabilities of success $\widehat{p}$, the information matrix is calculated as:
$$
\begin{aligned}
I(\widehat{\beta}) = X^\top\operatorname{diag}\left\{\widehat{p}(1-\widehat{p}\right\}X
\end{aligned}
$$
Note that the penalty is a function of the fitted probabilities $\widehat{p}$ as well as the model matrix $X$. So when you drop a predictor, you change the penalty term as well.
Let's use your example to demonstrate the issue.
Note: I drop dia from the full model; otherwise the data is completely separable. With this change, we can fit the model case ~ oc + vic + vicl + vis + age with and without penalization. For more about complete separation, see How to deal with perfect separation in logistic regression?.
full.model <- logistf(
case ~ oc + vic + vicl + vis + age,
data = sex2, firth = TRUE
)
reduced.model <- logistf(
case ~ oc + vic + vicl + vis,
data = sex2, firth = TRUE
)
null.model <- logistf(
case ~ 1,
data = sex2, firth = TRUE
)
rbind(
full.model = full.model$loglik,
reduced.model = reduced.model$loglik,
null.model = null.model$loglik
)
#> full null
#> full.model -136.6498 -157.3390
#> reduced.model -140.8798 -158.3699
#> null.model -162.6972 -162.6972
We get three different values for the null log-likelihood! That's because the three models — the full model, the restricted model (without age) and the explicit null model — are all penalized to a different degree by Firth's method.
data.frame(
pvalue.logistf = full.model$prob["age"],
pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#> pvalue.logistf pvalue.byhand
#> age 0.01055201 0.003630392
Now let's turn off penalization by setting firth = FALSE and refit the same three models.
full.model <- logistf(
case ~ oc + vic + vicl + vis + age,
data = sex2, firth = FALSE
)
reduced.model <- logistf(
case ~ oc + vic + vicl + vis,
data = sex2, firth = FALSE
)
null.model <- logistf(
case ~ 1,
data = sex2, firth = FALSE
)
Reassuringly, we get the same value for the null log-likelihood in the second column because the models are fitted with the standard maximum likelihood method. There is no penalty.
rbind(
full.model = full.model$loglik,
reduced.model = reduced.model$loglik,
null.model = null.model$loglik
)
#> full null
#> full.model -143.1338 -164.7384
#> reduced.model -146.5363 -164.7384
#> null.model -164.7384 -164.7384
And now the calculation "by hand" does reproduce the p-value for the age predictor.
data.frame(
pvalue.logistf = full.model$prob["age"],
pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#> pvalue.logistf pvalue.byhand
#> age 0.009090372 0.009090372
logistf.fitgives different results if you (a) remove a column from the model matrix (as inwoage) compared to (b) removing that element from thecol.fitargument (as infit.i). The coefficients & fitted values are practically identical, but the log-likelihoods are (a) -137.24 and (b) -136.29. This is because a term (related to the log of the determinant of the covariance matrix?) is added to the loglik, and this differs between the two approaches. – Mark Dec 14 '21 at 22:19