1

I have a zero inflated count data, on which I have run a poisson and quasi poisson reg using glm().

The output from a poisson model is as follows:

Call:
glm(formula = sum_count ~ location * treatment, family = poisson, 
    data = dat)

Deviance Residuals: Min 1Q Median 3Q Max
-32.297 -21.360 -12.476 5.448 80.361

Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.75215 0.01455 395.31 <2e-16 *** locationNorth -0.30685 0.02235 -13.73 <2e-16 *** treatmentclosed -1.14865 0.03235 -35.51 <2e-16 *** treatmentday -0.32222 0.02245 -14.35 <2e-16 *** treatmentnight 0.34802 0.01901 18.31 <2e-16 *** locationNorth:treatmentclosed -0.73419 0.06081 -12.07 <2e-16 *** locationNorth:treatmentday 1.13369 0.03032 37.39 <2e-16 *** locationNorth:treatmentnight -1.37528 0.03812 -36.08 <2e-16 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 67655  on 113  degrees of freedom

Residual deviance: 54990 on 106 degrees of freedom AIC: 55474

Number of Fisher Scoring iterations: 7

The output from the quasipoisson model is as follows:

Call:
glm(formula = sum_count ~ location * treatment, family = quasipoisson, 
    data = dat)

Deviance Residuals: Min 1Q Median 3Q Max
-32.297 -21.360 -12.476 5.448 80.361

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7521 0.3753 15.327 <2e-16 *** locationNorth -0.3068 0.5764 -0.532 0.596
treatmentclosed -1.1486 0.8343 -1.377 0.171
treatmentday -0.3222 0.5790 -0.557 0.579
treatmentnight 0.3480 0.4902 0.710 0.479
locationNorth:treatmentclosed -0.7342 1.5685 -0.468 0.641
locationNorth:treatmentday 1.1337 0.7821 1.450 0.150
locationNorth:treatmentnight -1.3753 0.9831 -1.399 0.165


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 665.2146)

Null deviance: 67655  on 113  degrees of freedom

Residual deviance: 54990 on 106 degrees of freedom AIC: NA

Number of Fisher Scoring iterations: 7

My question is: (given my rudimentary understanding of count models), why is the over dispersion parameter for both models the same?

glm1$deviance/glm1$df.residual # estimating the dispersion
[1] 518.7702

glm1.q$deviance/glm1.q$df.residual # estimating the dispersion [1] 518.7702

Any insight is highly welcome. Thank you!

1 Answers1

2

tl;dr there's no need to check for over- (or under-)dispersion if you're already adjusting for overdispersion by fitting a quasi-likelihood model.

The deviance calculated in a quasi-Poisson model is not adjusted. The only place the quasi-Poisson model differs from the Poisson is in the output of the estimated dispersion and in the standard errors (which are adjusted based on the dispersion).

Here is the relevant code from summary.glm():

 if (is.null(dispersion)) 
        dispersion <- if (object$family$family %in% c("poisson", 
            "binomial")) 
            1
        else if (df.r > 0) {
            est.disp <- TRUE
            if (any(object$weights == 0)) 
                warning("observations with zero weight not used for calculating dispersion")
            sum((object$weights * object$residuals^2)[object$weights > 
                0])/df.r
        }

Ignoring the weights, you can see that the dispersion is defined as sum(object$residuals^2)/object$df.residual. If the stored residuals ($residuals) were deviance residuals, then their sum of squares would be exactly equal to the deviance. They're working residuals instead, which means that this computed dispersion won't be exactly the same as (deviance/df.residual), but for reasonably large data sets it should be very close. (See here for more on residual types.)

Ben Bolker
  • 43,543