Calculation of p-values in logistf package

Question

I'm trying to understand profile likelihood used in the logistf package. In the code, it seems the p-values are calculated by:

fit.i <- logistf.fit(x, y, weight=weight, 
          offset=offset, firth, 
          col.fit=(1:k)[-i], control=control)
iters <- c(iters, fit.i$iter)
fit$prob[i] <- 1-pchisq(2*(fit.full$loglik - 
                   fit.i$loglik), 1)

where fit.i$loglik is the log-likelihood with the coefficient of interest excluded.

If I try this manually, I get:

full <- logistf(case ~ age + oc + vic + vicl + 
              vis + dia, logistf::sex2)
woage <- logistf(case ~ oc + vic + vicl + vis + 
              dia, logistf::sex2)
1-pchisq(2*(full$loglik[2] - woage$loglik[2]), 1)
[1] 0.002178983

which is different from the result:

> full$prob
 (Intercept)          age           oc          vic         vicl          vis          dia 
8.020268e-01 6.143472e-03 8.751911e-01 1.678877e-06 1.237805e-05 5.449701e-02 4.951873e-03

What am I misunderstanding?

This isn't a complete answer, but logistf.fit gives different results if you (a) remove a column from the model matrix (as in woage) compared to (b) removing that element from the col.fit argument (as in fit.i). The coefficients & fitted values are practically identical, but the log-likelihoods are (a) -137.24 and (b) -136.29. This is because a term (related to the log of the determinant of the covariance matrix?) is added to the loglik, and this differs between the two approaches. — Mark, Dec 14 '21 at 22:19

score 2 · Answer 1 · answered Dec 29 '22 at 01:56

Your calculation doesn't reproduce the p-value correctly because by default logistf uses Firth's penalized maximum likelihood to fit the logistic regression and the penalty is a function of the predictors. In effect, the full model and the reduced model are penalized differently.

The penalized likelihood is: $$ \begin{aligned} \log L^*(\beta) = \log L(\beta) + \frac{1}{2}\log\det I(\beta) \end{aligned} $$ where $I(\beta)$ is the Fisher information matrix. (The second term — the penalty — is the term which @Mark points out in a comment.)

The coefficients $\beta$ are estimated iteratively and, given the current value for the probabilities of success $\widehat{p}$, the information matrix is calculated as: $$ \begin{aligned} I(\widehat{\beta}) = X^\top\operatorname{diag}\left\{\widehat{p}(1-\widehat{p}\right\}X \end{aligned} $$

Note that the penalty is a function of the fitted probabilities $\widehat{p}$ as well as the model matrix $X$. So when you drop a predictor, you change the penalty term as well.

Let's use your example to demonstrate the issue.

Note: I drop dia from the full model; otherwise the data is completely separable. With this change, we can fit the model case ~ oc + vic + vicl + vis + age with and without penalization. For more about complete separation, see How to deal with perfect separation in logistic regression?.

full.model <- logistf(
  case ~ oc + vic + vicl + vis + age,
  data = sex2, firth = TRUE
)
reduced.model <- logistf(
  case ~ oc + vic + vicl + vis,
  data = sex2, firth = TRUE
)
null.model <- logistf(
  case ~ 1,
  data = sex2, firth = TRUE
)
rbind(
  full.model = full.model$loglik,
  reduced.model = reduced.model$loglik,
  null.model = null.model$loglik
)
#>                    full      null
#> full.model    -136.6498 -157.3390
#> reduced.model -140.8798 -158.3699
#> null.model    -162.6972 -162.6972

We get three different values for the null log-likelihood! That's because the three models — the full model, the restricted model (without age) and the explicit null model — are all penalized to a different degree by Firth's method.

data.frame(
  pvalue.logistf = full.model$prob["age"],
  pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#>     pvalue.logistf pvalue.byhand
#> age     0.01055201   0.003630392

Now let's turn off penalization by setting firth = FALSE and refit the same three models.

full.model <- logistf(
  case ~ oc + vic + vicl + vis + age,
  data = sex2, firth = FALSE
)
reduced.model <- logistf(
  case ~ oc + vic + vicl + vis,
  data = sex2, firth = FALSE
)
null.model <- logistf(
  case ~ 1,
  data = sex2, firth = FALSE
)

Reassuringly, we get the same value for the null log-likelihood in the second column because the models are fitted with the standard maximum likelihood method. There is no penalty.

rbind(
  full.model = full.model$loglik,
  reduced.model = reduced.model$loglik,
  null.model = null.model$loglik
)
#>                    full      null
#> full.model    -143.1338 -164.7384
#> reduced.model -146.5363 -164.7384
#> null.model    -164.7384 -164.7384

And now the calculation "by hand" does reproduce the p-value for the age predictor.

data.frame(
  pvalue.logistf = full.model$prob["age"],
  pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#>     pvalue.logistf pvalue.byhand
#> age    0.009090372   0.009090372

Calculation of p-values in logistf package

1 Answers1