Comparing levels of factors after a GLM in R

Question

Here is a little background about my situation: my data refer to the number of prey successfully eaten by a predator. As the number of prey is limited (25 available) in each trial, I had a column "Sample" representing the number of available prey (so, 25 in each trial), and another called "Count" which was the number of success (how many prey were eaten). I based my analysis on the example from the R book on proportion data (page 578). The explanatory variables are Temperature (4 levels, which I treated as factor), and Sex of the predator (obviously, male or female). So I end up with this model:

model <- glm(y ~ Temperature + Sex + Temperature*Sex, 
          data=predator, family=quasibinomial)

After getting the Analysis of Deviance table, it turns out that Temperature and Sex (but not the interaction) have a significant effect on the consumption of prey. Now, my problem: I need to know which temperatures differ, i.e., I have to compare the 4 temperatures to each other. If I had a linear model, I would use the TukeyHSD function, but as I am using a GLM I can't. I have been looking through the package MASS and trying to set up a contrast matrix but for some reason it doesn't work. Any suggestions or references?

Here's the summary I get from my model, if that helps making it clearer ...

y <- cbind(data$Count, data$Sample-data$Count)
model <- glm(y ~ Temperature + Sex + Temperature*Sex, 
             data=predator, family=quasibinomial) 
> summary(model)
Call:
glm(formula = y ~ Temperature + Sex + Temperature * Sex,
   family=quasibinomial, data=data)


Deviance Residuals:
Min       1Q   Median       3Q      Max
-3.7926  -1.4308  -0.3098   0.9438   3.6831
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                             -1.6094     0.2672  -6.024 3.86e-08 ***
Temperature8                             0.3438     0.3594   0.957   0.3414
Temperature11                           -1.0296     0.4803  -2.144   0.0348 *
Temperature15                           -1.2669     0.5174  -2.449   0.0163 *
SexMale                                    0.3822     0.3577   1.069   0.2882
Temperature8:SexMale                    -0.2152     0.4884  -0.441   0.6606
Temperature11:SexMale                    0.4136     0.6093   0.679   0.4990
Temperature15:SexMale                    0.4370     0.6503   0.672   0.5033
---
Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 2.97372)
Null deviance: 384.54  on 95  degrees of freedom
Residual deviance: 289.45  on 88  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5

Hi @Anne and welcome. You can try to use the glht function in the multcomp package. To perform TukeyHSD tests for temperature, use it like that glht(my.glm, mcp(Temperature="Tukey")). And btw: Your model formula can be abbreviated to: model<-glm(y ~ Temperature*Sex data=predator, family=quasibinomial). With the asterisk ($$) the interactions and* the main effects are fitted. — COOLSerdash, May 29 '13 at 14:13
Hi, thanks for your quick reply ! However I must be doing something wrong because I only get an error message... I assume that my.glm is the glm I performed earlier (therefore, "model" in the case). What does mcp refer to ? I get an error message saying that the dimensions of coefficients and covariance matrix don't match... ? — Anne, May 29 '13 at 14:22
Why did you model Temperature as a factor? Don't you have the actual numerical values? I would use them as a continuous variable & then this entire issue is moot. — gung - Reinstate Monica, May 29 '13 at 14:33
I am using Temperature as a factor because that is how I defined it in my experimental design. I guess it could be taken as a numerical value, but I would like to know how to solve my comparison problem on a factor all the same, as I might need it for further data (say, if I had "Substrate type" instead of Temperature).

COOLSerdash, I don't know if my editing is what you were looking for as a "model output"... Tell me if I got it wrong and I'll make my best to fix it. — Anne, May 29 '13 at 15:30
It's perfectly reasonable to want to know how to do this in general; your question stands. However, w/ regard to your specific situation, I would use temp as a continuous variable even if you had originally thought of it as a factor. Setting aside issues w/ multiple comparisons, modeling temp as a factor is an inefficient use of the info you have. — gung - Reinstate Monica, May 29 '13 at 15:48

COOLSerdash · Accepted Answer · 2022-12-08T09:42:56.393

Anne, I will shorty explain how to do such multiple comparisons in general. Why this doesn't work in your specific case, I don't know; I'm sorry.

Edit: Nowadays, I'd recommend using the emmeans package to do pairwise comparisons of the marginal means. Another possibility is the multcomp package and the function glht, which compares the coefficients of the main effects and not the marginal means. In the presence of an interaction, this makes less sens in my opinion. Here is an example:

mydata      <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
my.mod      <- glm(admit~gre+gpa*rank, data=mydata, family=quasibinomial)
summary(my.mod)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.985768   2.498395  -1.996   0.0467 *
gre          0.002287   0.001110   2.060   0.0400 *
gpa          1.089088   0.731319   1.489   0.1372
rank2        0.503294   2.982966   0.169   0.8661
rank3        0.450796   3.266665   0.138   0.8903
rank4       -1.508472   4.202000  -0.359   0.7198
gpa:rank2   -0.342951   0.864575  -0.397   0.6918
gpa:rank3   -0.515245   0.935922  -0.551   0.5823
gpa:rank4   -0.009246   1.220757  -0.008   0.9940
---
Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If you wanted to calculate the pairwise comparisons of the marginal means between the ranks using Tukey's HSD with emmeans, you could do that in this way:

library(emmeans)
em <- emmeans(my.mod, "rank")
contrast(em, "pairwise", adjust = "Tukey")
contrast      estimate    SE  df z.ratio p.value
 rank1 - rank2    0.659 0.323 Inf   2.038  0.1739
 rank1 - rank3    1.296 0.358 Inf   3.620  0.0017
 rank1 - rank4    1.540 0.426 Inf   3.611  0.0017
 rank2 - rank3    0.637 0.291 Inf   2.190  0.1261
 rank2 - rank4    0.881 0.372 Inf   2.370  0.0830
 rank3 - rank4    0.244 0.401 Inf   0.608  0.9296
Results are given on the log odds ratio (not the response) scale. 
P value adjustment: tukey method for comparing a family of 4 estimates

The function emmeans prints a warning that there are interactions present which may make the results misleading. For more information, the emmeans package offers a lot of tutorials (called vignettes).

Now let's compare the coefficients of the main effect for rank with multcomp:

library(multcomp)
summary(glht(my.mod, mcp(rank="Tukey")))
Fit: glm(formula = admit ~ gre + gpa * rank, family = quasibinomial, 
    data = mydata)
Linear Hypotheses:
           Estimate Std. Error z value Pr(>|z|)
2 - 1 == 0   0.5033     2.9830   0.169    0.998
3 - 1 == 0   0.4508     3.2667   0.138    0.999
4 - 1 == 0  -1.5085     4.2020  -0.359    0.984
3 - 2 == 0  -0.0525     2.6880  -0.020    1.000
4 - 2 == 0  -2.0118     3.7540  -0.536    0.949
4 - 3 == 0  -1.9593     3.9972  -0.490    0.960
(Adjusted p values reported -- single-step method)
Warning message:
In mcp2matrix(model, linfct = linfct) :
  covariate interactions found -- default contrast might be inappropriate

All pairwise comparisons of the main effect for rank are given together with a $p$-value. Warning: These comparisons only concern the main effects. The interactions are ignored. If interactions are present, a warning will be given (as in the output above). For a more extensive tutorial on how to proceed when interactions are present, see these additional multcomp examples.

Note: As @gung noted in the comments, you should - whenever possible - include temperature as a continuous rather than a categorical variable. Concerning the interaction: you could perform a likelihood ratio test to check whether the interaction term significantly improves the model fit. In your case, the code would look something like that:

# Original model
model <- glm(y ~ Temperature+Sex+Temperature*Sex, data=predator, family=quasibinomial)
Model without an interaction
model2 <- glm(y ~ Temperature+Sex data=predator, family=quasibinomial)
Likelihood ratio test
anova(model, model2, test="LRT")

If this test is not significant, you may remove the interaction from your model. Maybe glht will work then?

Oh god, thank you SO much !! I have been able to write the command correctly this time and it worked ! Thanks again ! — Anne, May 29 '13 at 17:45
Additionnal question : is there a way to get multiple comparisons on the interaction ? I've got similar data, where the interaction (from the initial question, that would be Temperature*Sex) is significant, and I was wondering if it is possible to compare those together... — Anne, Nov 01 '13 at 12:10
Do you mean multiple comparison for each level of the interaction? If yes, you might find this site interesting (the last paragraph shows how to test all possible pairwise combinations). — COOLSerdash, Nov 01 '13 at 21:20
you can create a variable which corresponds to the interactions for a variable and use this variable to carry out the mcp. You do it like this. mydata$gparank <- interaction(mydata$gpa, mydata$rank) — Notquitesure, Sep 09 '14 at 08:33
@COOLSerdash, that link no longer works - any idea where it is now? — Nova, Aug 29 '17 at 14:15
@Nova which link do you mean? The one in the comments? Here is the new link to that site. — COOLSerdash, Aug 29 '17 at 15:46

Comparing levels of factors after a GLM in R

Call:

glm(formula = y ~ Temperature + Sex + Temperature * Sex,

Deviance Residuals:

Min 1Q Median 3Q Max

-3.7926 -1.4308 -0.3098 0.9438 3.6831

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.6094 0.2672 -6.024 3.86e-08 ***

Temperature8 0.3438 0.3594 0.957 0.3414

Temperature11 -1.0296 0.4803 -2.144 0.0348 *

Temperature15 -1.2669 0.5174 -2.449 0.0163 *

SexMale 0.3822 0.3577 1.069 0.2882

Temperature8:SexMale -0.2152 0.4884 -0.441 0.6606

Temperature11:SexMale 0.4136 0.6093 0.679 0.4990

Temperature15:SexMale 0.4370 0.6503 0.672 0.5033

---

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 2.97372)

Null deviance: 384.54 on 95 degrees of freedom

Residual deviance: 289.45 on 88 degrees of freedom

AIC: NA

Number of Fisher Scoring iterations: 5

1 Answers1

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -4.985768 2.498395 -1.996 0.0467 *

gre 0.002287 0.001110 2.060 0.0400 *

gpa 1.089088 0.731319 1.489 0.1372

rank2 0.503294 2.982966 0.169 0.8661

rank3 0.450796 3.266665 0.138 0.8903

rank4 -1.508472 4.202000 -0.359 0.7198

gpa:rank2 -0.342951 0.864575 -0.397 0.6918

gpa:rank3 -0.515245 0.935922 -0.551 0.5823

gpa:rank4 -0.009246 1.220757 -0.008 0.9940

---

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model without an interaction

Likelihood ratio test

Linked