Different p values of interaction terms in a logistics regression model obtained by two methods

Question

I would like to obtain the p-value of an interaction term in a logistic regression model. I have tried two methods to accomplish this.

First, I fitted a logistic model that includes the interaction term and obtained the p-value returned by the model (SELECT_GroupGene12:CDKiY, p-value = 0.994). Here's the code and summary of the model:

fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
summary(fit1)

The output is as follows:

Call:
glm(formula = CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + 
    Histology, family = "binomial", data = tmp_Clin)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max

-1.1448  -0.9129  -0.7868   1.3027   1.6270
Coefficients:
                           Estimate Std. Error z value Pr(>|z|)
(Intercept)                 0.41881    1.80366   0.232    0.816
SELECT_GroupGene12         18.67230 3761.85360   0.005    0.996
CDKiY                      -0.46924    0.61670  -0.761    0.447
Patient_Dx_Age             -0.01417    0.03301  -0.429    0.668
HistologyILC              -18.03547 6522.63863  -0.003    0.998
HistologyOther            -18.19134 6522.63862  -0.003    0.998
Histologyuk                19.05416 6522.63863   0.003    0.998
SELECT_GroupGene12:CDKiY  -36.34491 4978.39200  -0.007    0.994
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.025  on 56  degrees of freedom

Residual deviance: 60.682  on 49  degrees of freedom
AIC: 76.682
Number of Fisher Scoring iterations: 17

Another method I tried involves fitting two logistic regression models, one with the interaction term and one without. I obtained the p-value (0.01513) using the anova function:

fit1 <- glm(CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
fit2 <- glm(CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology, tmp_Clin, family = 'binomial')
anova(fit1, fit2, test = 'Chisq')

The output is as follows

Analysis of Deviance Table
Model 1: CDKi_Response3 ~ SELECT_Group * CDKi + Patient_Dx_Age + Histology
Model 2: CDKi_Response3 ~ SELECT_Group + CDKi + Patient_Dx_Age + Histology
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1        49     60.682

2        50     66.564 -1  -5.8811   0.0153 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here is the distribution of the outcome variable CDKi_Response3 stratified by SELECT_Group and CDKi. Upon reviewing this distribution, it appears that there is a significant interaction between SELECT_Group and CDKi.

table(tmp_Clin$CDKi_Response3, tmp_Clin$SELECT_Group, tmp_Clin$CDKi)

The table output is as follows:

CDKi  = N
Gene1 Gene12

0    16      0
  1    11      3
CDKi  = Y
Gene1 Gene12

0    16      4
  1     7      0

I'm wondering why different pvalues were obtained by these two methods and which method should be used. Thank you.

By coincidence, I noticed that you cross-posted this question on another forum: here. That's not considered cool. — dipetkov, May 31 '23 at 06:40

EdM · Accepted Answer · 2023-05-30T15:08:34.057

The very high standard errors of your coefficients indicate that you have something approaching perfect separation in your combination of data and model. That means that some combination of predictors can almost completely distinguish the two outcomes in CDKi_Response3.

In your situation, that's exacerbated by the small number of cases relative to the number of regression coefficients you are trying to estimate. In binomial regression you are at risk of overfitting if you try to estimate more than 1 coefficient per 15 or so cases in the minority outcome class.

Your CDKi = Y outcomes only number 27, so you shouldn't try to estimate more than 2 coefficients. You are trying to estimate 7 coefficients beyond the intercept. So it's not surprising that some combination of predictors is highly associated with outcome in your particular data set. The danger is that your results are unlikely to apply to new data sets.

In terms of why the p-values differ so much, note that there are 3 types of significance tests associated with models like binomial regression that are fit by maximum likelihood. See this page for an explanation of the differences between the score, Wald, and likelihood-ratio tests.

The coefficient p-values are Wald-test values, based on normality of the coefficient estimates. That assumption is unlikely with these data; with (near) perfect separation, Wald tests aren't reliable. The anova() test that you performed is a likelihood-ratio test, which in general is considered more reliable with small numbers of cases. I'd be reluctant to trust any tests on such overfit models, however.

With such a small number of events, you could consider a penalized ridge regression to minimize overfitting. That's implemented for example in the R glmnet package. As a comment from Ben Bolker on the first version of this answer notes, however, inference based on ridge regression is then problematic even if predictions from the ridge model are more likely to extend to new data. See this page for some discussion.

Good answer, but note that applying penalized methods makes inference (p-values etc.) extremely challenging. — Ben Bolker, May 30 '23 at 14:34
@BenBolker yes, thanks. I added that caution to the end of the answer. — EdM, May 30 '23 at 15:09
I like this (free-access) paper which summarizes several solutions to the separation problem: https://doi.org/10.1093/aje/kwx299 > "When the accuracy of risk prediction is central, methods that focus on prediction error (such as LASSO and traditional ridge regression) are useful. In contrast, when the parameters are the target, it seems more appropriate to use methods designed to minimize error in their estimation such as coefficient penalization." — dipetkov, May 31 '23 at 06:47

Different p values of interaction terms in a logistics regression model obtained by two methods

1 Answers1