Why is ANOVA non significant when 95% confidence interval of coefficient is for glm model in R?

Question

I'm a bit confused as the output of my model in R.

I have built a generalised estimating equation glm model aiming to see the effect of time (here coded as timestrat) on a variable called new1804. I have controlled for a range of other variables.

mf04 <- formula(new1804 ~ timestrat + urban  + 
    marital + ses + timeinsample+state +    
    dependancestrat + didattemptquitinlastyear + 
    plan2quit + agestrat + sex + weight23) 
geeInd04 <- geeglm(mf04, id=uniqid, 
    data=finaldf04, family=poisson, 
    corstr="independence")

My confusion comes in the output of the model. ANOVA analysis says that timestrat is not significant. However, when looking at the summary- the coefficient and standard errors suggest that it is. I have calculated my upper and lower confidence interval for the coefficient as below and got the results of -0.2947138 for the lower bound and -0.0513192 for the upper bound.

If my 95% confidence interval for the coefficient is negative for both the upper and lower bound, why is it that ANOVA is returning a non significant result?

Calculation of upper and lower bounds:

lwrcoef <- estimate - 1.96*stderr
uprcoef <- estimate + 1.96*stderr

Summary of model:

Summary(geeInd04)
Call:
geeglm(formula = mf04, family = poisson, 
    data = finaldf04, id = uniqid, 
    corstr = "independence")
Coefficients:
                          Estimate  Std.err  Wald Pr(>|W|)

(Intercept)                0.70201  0.11849 35.10  3.1e-09 ***
timestrat1                -0.17302  0.06209  7.76   0.0053 ** 
urban2                    -0.01993  0.04376  0.21   0.6488

marital2                  -0.09071  0.08561  1.12   0.2893

marital3                   0.01912  0.03451  0.31   0.5797

marital4                  -0.05872  0.03356  3.06   0.0802 .

sesL                      -0.02625  0.02529  1.08   0.2994

timeinsample               0.09883  0.04358  5.14   0.0233 *

state19                    0.03256  0.08025  0.16   0.6849

state23                    0.15694  0.05939  6.98   0.0082 ** 
state27                    0.10763  0.06275  2.94   0.0863 .

dependancestrat0          -0.00335  0.06022  0.00   0.9556

dependancestrat1          -0.02140  0.04915  0.19   0.6633

dependancestrat2          -0.02250  0.04619  0.24   0.6261

dependancestrat3          -0.07338  0.04452  2.72   0.0993 .

didattemptquitinlastyear1 -0.02050  0.03104  0.44   0.5090

plan2quit2                 0.16451  0.06616  6.18   0.0129 *

plan2quit3                 0.08602  0.06958  1.53   0.2163

plan2quit4                 0.04591  0.06912  0.44   0.5066

agestrat40-55              0.00398  0.02541  0.02   0.8756

agestratOver 55           -0.05170  0.03324  2.42   0.1199

agestratUnder 30           0.03823  0.03150  1.47   0.2249

sex2                       0.07681  0.02360 10.59   0.0011 ** 
weight23                   0.04968  0.02310  4.63   0.0315 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation structure = independence 
Estimated Scale Parameters:
        Estimate Std.err

(Intercept)    0.454  0.0228
Number of clusters:   270  Maximum cluster 
size: 99

Anova:

anova(geeInd04)
Analysis of 'Wald statistic' Table
Model: poisson, link: log
Response: new1804
Terms added sequentially (first to last)
                     Df    X2 P(&gt;|Chi|)    

timestrat                 1  1.79   0.18109

urban                     1  0.21   0.64487

marital                   3  2.98   0.39431

ses                       1  0.82   0.36383

timeinsample              1  3.75   0.05273 .

state                     3  7.45   0.05885 .

dependancestrat           4  4.41   0.35280

didattemptquitinlastyear  1  0.02   0.89898

plan2quit                 3 15.87   0.00120 ** 
agestrat                  3  8.80   0.03211 *

sex                       1 10.95   0.00094 ***
weight23                  1  4.63   0.03150 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary
function (object, ...) 
UseMethod("summary")
<bytecode: 0x7fab49a332b0>
<environment: namespace:base>
> summary(geeInd04)

anova says Terms added sequentially (first to last). The other function is testing in the presence of all other variables. — user2554330, Apr 01 '22 at 16:54
See this thread about different types of anova(). The documentation in geepack doesn't seem very clear about just what anova() method it invokes for its models, or whether you could use something other than this default "Type I" sequential ANOVA. — EdM, Apr 01 '22 at 17:16

score 1 · Answer 1 · answered Apr 05 '22 at 23:11

The commenters have provided the answer...I'll enter it here. The first output is presenting the P-values for each individual coefficient in the model. For this output, this can be interpreted with the following $$H_0 \ :\ \beta_i = 0$$ and a significant finding suggests that the coefficient is indeed not zero. Alternatively, this could also be interpreted as $$H_0 \ :\ \text{model}_\text{wo}\ \text{ IS AS GOOD AS }\ \text{model}_\text{w}$$ where we are asking if the model without that one variable (but all of the other variables in the model) is essentially as good as the model with all the variables (including the one we are focusing on). Though the P-values are not exactly the same here (as they would be say with a ordinary multiple regression model), the same general idea can be applied to interpreting the P-values in either fashion. So, if the P-value is statistically significant, this means the coefficient is not zero OR the model improves with the addition of this variable compared to the model without it.

The key point is that this is a one-out analysis for every one of the variables listed (be they predictors or instrumental variables, e.g., your categorical predictors need more than one instrumental variable to represent them in the model). As a consequence of this, the order of entry of variables in this model does not matter.

However, with the anova(·) function, the order does matter (as the output indicates). In this case, if the time variable is added to the null model, there is not a statistically significant improvement in the model. But this is not the same comparison as suggested above. The output above is asking if time improves the model after ALL THE OTHER variables went in first.

So, the curious thing about this finding is that, on its own, time does not appear to be much of a good predictor. But, after controlling for the influence of the other variables, time is indeed a predictor of the remaining variability in the dependent variable.

Why is ANOVA non significant when 95% confidence interval of coefficient is for glm model in R?

1 Answers1