3

I'm trying to understand profile likelihood used in the logistf package. In the code, it seems the p-values are calculated by:

fit.i <- logistf.fit(x, y, weight=weight, 
          offset=offset, firth, 
          col.fit=(1:k)[-i], control=control)
iters <- c(iters, fit.i$iter)
fit$prob[i] <- 1-pchisq(2*(fit.full$loglik - 
                   fit.i$loglik), 1)

where fit.i$loglik is the log-likelihood with the coefficient of interest excluded.

If I try this manually, I get:

full <- logistf(case ~ age + oc + vic + vicl + 
              vis + dia, logistf::sex2)
woage <- logistf(case ~ oc + vic + vicl + vis + 
              dia, logistf::sex2)
1-pchisq(2*(full$loglik[2] - woage$loglik[2]), 1)

[1] 0.002178983

which is different from the result:

> full$prob
 (Intercept)          age           oc          vic         vicl          vis          dia 
8.020268e-01 6.143472e-03 8.751911e-01 1.678877e-06 1.237805e-05 5.449701e-02 4.951873e-03 

What am I misunderstanding?

jon
  • 31

1 Answers1

2

Your calculation doesn't reproduce the p-value correctly because by default logistf uses Firth's penalized maximum likelihood to fit the logistic regression and the penalty is a function of the predictors. In effect, the full model and the reduced model are penalized differently.

The penalized likelihood is: $$ \begin{aligned} \log L^*(\beta) = \log L(\beta) + \frac{1}{2}\log\det I(\beta) \end{aligned} $$ where $I(\beta)$ is the Fisher information matrix. (The second term — the penalty — is the term which @Mark points out in a comment.)

The coefficients $\beta$ are estimated iteratively and, given the current value for the probabilities of success $\widehat{p}$, the information matrix is calculated as: $$ \begin{aligned} I(\widehat{\beta}) = X^\top\operatorname{diag}\left\{\widehat{p}(1-\widehat{p}\right\}X \end{aligned} $$

Note that the penalty is a function of the fitted probabilities $\widehat{p}$ as well as the model matrix $X$. So when you drop a predictor, you change the penalty term as well.

Let's use your example to demonstrate the issue.

Note: I drop dia from the full model; otherwise the data is completely separable. With this change, we can fit the model case ~ oc + vic + vicl + vis + age with and without penalization. For more about complete separation, see How to deal with perfect separation in logistic regression?.

full.model <- logistf(
  case ~ oc + vic + vicl + vis + age,
  data = sex2, firth = TRUE
)
reduced.model <- logistf(
  case ~ oc + vic + vicl + vis,
  data = sex2, firth = TRUE
)
null.model <- logistf(
  case ~ 1,
  data = sex2, firth = TRUE
)

rbind( full.model = full.model$loglik, reduced.model = reduced.model$loglik, null.model = null.model$loglik ) #> full null #> full.model -136.6498 -157.3390 #> reduced.model -140.8798 -158.3699 #> null.model -162.6972 -162.6972

We get three different values for the null log-likelihood! That's because the three models — the full model, the restricted model (without age) and the explicit null model — are all penalized to a different degree by Firth's method.

data.frame(
  pvalue.logistf = full.model$prob["age"],
  pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#>     pvalue.logistf pvalue.byhand
#> age     0.01055201   0.003630392

Now let's turn off penalization by setting firth = FALSE and refit the same three models.

full.model <- logistf(
  case ~ oc + vic + vicl + vis + age,
  data = sex2, firth = FALSE
)
reduced.model <- logistf(
  case ~ oc + vic + vicl + vis,
  data = sex2, firth = FALSE
)
null.model <- logistf(
  case ~ 1,
  data = sex2, firth = FALSE
)

Reassuringly, we get the same value for the null log-likelihood in the second column because the models are fitted with the standard maximum likelihood method. There is no penalty.

rbind(
  full.model = full.model$loglik,
  reduced.model = reduced.model$loglik,
  null.model = null.model$loglik
)
#>                    full      null
#> full.model    -143.1338 -164.7384
#> reduced.model -146.5363 -164.7384
#> null.model    -164.7384 -164.7384

And now the calculation "by hand" does reproduce the p-value for the age predictor.

data.frame(
  pvalue.logistf = full.model$prob["age"],
  pvalue.byhand = 1 - pchisq(2 * (full.model$loglik[1] - reduced.model$loglik[1]), df = 1)
)
#>     pvalue.logistf pvalue.byhand
#> age    0.009090372   0.009090372
dipetkov
  • 9,805