Logistic regression: anova chi-square test vs. significance of coefficients (anova() vs summary() in R)

Question

I have a logistic GLM model with 8 variables. I ran a chi-square test in R anova(glm.model,test='Chisq') and 2 of the variables turn out to be predictive when ordered at the top of the test and not so much when ordered at the bottom. The summary(glm.model) suggests that their coefficients are insignificant (high p-value). In this case it seems that the variables are not significant.

I wanted to ask which is a better test of variables significance - the coefficient significance in the model summary or the chi-square test from anova(). Also - when is either one better over the other?

I guess it's a broad question but any pointers on what to consider will be appreciate.

This is analogous to the distinction between type I & type III Sums of Squares for testing coefficients in linear models. It might help you to read my answer here: how to interpret type I sequential ANOVA and MANOVA. — gung - Reinstate Monica, May 23 '13 at 20:00

COOLSerdash · Accepted Answer · 2020-05-06T11:41:13.863

In addition to @gung's answer, I'll try to provide an example of what the anova function actually tests. I hope this enables you to decide what tests are appropriate for the hypotheses you are interested in testing.

Let's assume that you have an outcome $y$ and 3 predictor variables: $x_{1}$, $x_{2}$, and $x_{3}$. Now, if your logistic regression model would be my.mod <- glm(y~x1+x2+x3, family="binomial"). When you run anova(my.mod, test="Chisq"), the function compares the following models in sequential order. This type is also called Type I ANOVA or Type I sum of squares (see this post for a comparison of the different types):

glm(y~1, family="binomial") vs. glm(y~x1, family="binomial")
glm(y~x1, family="binomial") vs. glm(y~x1+x2, family="binomial")
glm(y~x1+x2, family="binomial") vs. glm(y~x1+x2+x3, family="binomial")

So it sequentially compares the smaller model with the next more complex model by adding one variable in each step. Each of those comparisons is done via a likelihood ratio test (LR test; see example below). To my knowledge, these hypotheses are rarely of interest, but this has to be decided by you.

Here is an example in R:

mydata      <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)

my.mod <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
summary(my.mod)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.989979   1.139951  -3.500 0.000465 ***
gre          0.002264   0.001094   2.070 0.038465 *  
gpa          0.804038   0.331819   2.423 0.015388 *  
rank2       -0.675443   0.316490  -2.134 0.032829 *  
rank3       -1.340204   0.345306  -3.881 0.000104 ***
rank4       -1.551464   0.417832  -3.713 0.000205 ***
   ---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

# The sequential analysis
anova(my.mod, test="Chisq")

Terms added sequentially (first to last)    

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   399     499.98              
gre   1  13.9204       398     486.06 0.0001907 ***
gpa   1   5.7122       397     480.34 0.0168478 *  
rank  3  21.8265       394     458.52 7.088e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# We can make the comparisons by hand (adding a variable in each step)

  # model only the intercept
mod1 <- glm(admit ~ 1,                data = mydata, family = "binomial") 
  # model with intercept + gre
mod2 <- glm(admit ~ gre,              data = mydata, family = "binomial") 
  # model with intercept + gre + gpa
mod3 <- glm(admit ~ gre + gpa,        data = mydata, family = "binomial") 
  # model containing all variables (full model)
mod4 <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial") 

anova(mod1, mod2, test="LRT")

Model 1: admit ~ 1
Model 2: admit ~ gre
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       399     499.98                          
2       398     486.06  1    13.92 0.0001907 ***

anova(mod2, mod3, test="LRT")

Model 1: admit ~ gre
Model 2: admit ~ gre + gpa
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1       398     486.06                       
2       397     480.34  1   5.7122  0.01685 *

anova(mod3, mod4, test="LRT")

Model 1: admit ~ gre + gpa
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       397     480.34                          
2       394     458.52  3   21.826 7.088e-05 ***

The $p$-values in the output of summary(my.mod) are Wald tests which test the following hypotheses (note that they're interchangeable and the order of the tests does not matter):

For coefficient of x1: glm(y~x2+x3, family="binomial") vs. glm(y~x1+x2+x3, family="binomial")
For coefficient of x2: glm(y~x1+x3, family="binomial") vs. glm(y~x1+x2+x3, family="binomial")
For coefficient of x3: glm(y~x1+x2, family="binomial") vs. glm(y~x1+x2+x3, family="binomial")

So each coefficient against the full model containing all coefficients. Wald tests are an approximation of the likelihood ratio test. We could also do the likelihood ratio tests (LR test). Here is how:

mod1.2 <- glm(admit ~ gre + gpa,  data = mydata, family = "binomial")
mod2.2 <- glm(admit ~ gre + rank, data = mydata, family = "binomial")
mod3.2 <- glm(admit ~ gpa + rank, data = mydata, family = "binomial")

anova(mod1.2, my.mod, test="LRT") # joint LR test for rank

Model 1: admit ~ gre + gpa
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       397     480.34                          
2       394     458.52  3   21.826 7.088e-05 ***

anova(mod2.2, my.mod, test="LRT") # LR test for gpa

Model 1: admit ~ gre + rank
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1       395     464.53                       
2       394     458.52  1   6.0143  0.01419 *

anova(mod3.2, my.mod, test="LRT") # LR test for gre

Model 1: admit ~ gpa + rank
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1       395     462.88                       
2       394     458.52  1   4.3578  0.03684 *

The $p$-values from the likelihood ratio tests are very similar to those obtained by the Wald tests by summary(my.mod) above.

Note: The third model comparison for rank of anova(my.mod, test="Chisq") is the same as the comparison for rank in the example below (anova(mod1.2, my.mod, test="Chisq")). Each time, the $p$-value is the same, $7.088\cdot 10^{-5}$. It is each time the comparison between the model without rank vs. the model containing it.

+1, this is a good, comprehensive explanation. 1 small point: I believe that when test="Chisq", you are not running a likelihood ratio test, you need to set test="LRT" for that, see ?anova.glm. — gung - Reinstate Monica, May 23 '13 at 21:49
@gung Thanks for the compliment. test="LRT" and test="Chisq" are synonymous (it says it on the page you linked). — COOLSerdash, May 23 '13 at 21:52
No problem, but I think it's actually a good point. test="LRT" is better as it is immediately clear that it is a likelihood ratio test. I changed it. Thanks. — COOLSerdash, May 23 '13 at 22:03
+1 I'm impressed with your rapid progress here in just one month and your ability to provide a well-worked, clear explanation. Thanks for your efforts! — whuber, May 23 '13 at 22:27
@COOLSerdash Thanks for your answer. It seems you suggest that the Wald test (in the summary(model)) output is an accurate significance indicator than annova. I gather that since annova depends on order Wald's test is what matter more. Is it? — StreetHawk, May 08 '14 at 18:38
@StreetHawk anova just tests something different than summary (Wald). It tests sequentially. I guess that in most cases, you want to test whether each coefficient is different from 0, which is what summary does using a Wald test. — COOLSerdash, May 09 '14 at 08:05
@COOLSerdash Nice explanation! I have one question: What if, say, x3 is also a categorical variable, and I want to test whether there is any significance among the groups in x3? It seems like anova(glm model) only tests for the significance of the variable, but not the difference among the groups within the variable. — Jack Shi, Mar 27 '16 at 22:20
@JackShi Thanks. If I understood you correctly, you want to compare the levels of the factor against each other? I show how that can be done in another post. — COOLSerdash, Mar 28 '16 at 10:16
Great answer. May I ask how the p-values (7.088e-05, 0.01419, 00.03684) should be interpreted? — ə̷̶̸͇̘̜́̍͗̂̄︣͟, Jan 24 '18 at 20:19
@TheSimpliFire The first $p$-value provides evidence that rank improves the model in addition to gre and gpa. The second $p$-value provides evidence that the variable gpa improves the model in addition to gre and rank and the third $p$-value assesses the evidence that gre improves the model in addition to gpa and rank. Thus, each tests asks "does this additional variable improve the model given that these other two variables are already in the model?". — COOLSerdash, Jan 24 '18 at 21:28
Note that there is the convenience function drop1 in R that does the three ANOVA in your second code example. This is aka "Type III ANOVA". — cdalitz, Sep 30 '22 at 12:56

Logistic regression: anova chi-square test vs. significance of coefficients (anova() vs summary() in R)

1 Answers1

Linked