I have been learning how to draw the model prediction on a scatter plot, and noticed a bit counter-intuitive result. I would greatly help if you could kindly explain how I am mistaken here.
Let me explain my confusion using "grouseticks" data set embedded in lme4 package. I fitted GLMM and GLM to the same data set. The only difference between the two models is an inclusion/exclusion of the random effect, BROOD.
# GLMM fitting
# Poisson distribution with log-link, BROOD as the random effect
library(lme4)
fitm <- glmer(TICKS ~ cHEIGHT + (1|BROOD), family=poisson(link=log), data=grouseticks)
# GLM fitting
# Poisson distribution with log-link
fit <- glm(TICKS ~ cHEIGHT, family=poisson(link=log), data=grouseticks)
Then I calculated Nakagawa & Schielzeth's (2013) marginal and conditional R-squares (R2m and R2c), which concerns variance explained by fixed factors and variance explained by both fixed and random factors, respectively.
# R-squares for GLMM
library(MuMIn)
r.squaredGLMM(fitm)
The output is:
R2m R2c
delta 0.3055 0.9363
lognormal 0.3070 0.9409
trigamma 0.3038 0.9310
which indicates that the addition of the random effect "BROOD" substantially improves the predictive power of the model. Am I right?
OK, here's the thing. I tried to confirm the above result visually. What I have done is:
# scatter plot
plot(TICKS ~ cHEIGHT, data= grouseticks)
# prepare newdata for predict() function
nd <- data.frame(cHEIGHT=c(-60:60))
pr <- predict(fit, newdata=nd, re.form=NA, type="response")
par(new=T)
lines(nd$cHEIGHT, pr, lwd=2, col="black")
# obtain GLMM predictions and draw the prediction curve
prm <- predict(fitm, newdata=nd, re.form=NA, type="response")
par(new=T)
lines(nd$cHEIGHT, prm, lwd=2, col="red")
I don't see such a big improvement in fitting from GLM (black) to GLMM (red). Rather, it seems to me that GLM predict the no. of TICKS better than GLMM when cHEIGHT is low.
I suspected that somehow I might be using predict() function in a wrong way, and tried to use the estimated parameters from GLM and GLMM to predict the TICKS, i.e.
pr <- exp(1.5446 - 0.0231*nd$cHEIGHT)
prm <- exp(0.5684 - 0.0252*nd$cHEIGHT)
The results were completely the same with those obtained from predict(). Am I correctly performing the prediction? If not, please tell me how I am wrong, thanks.
I also calculated simple correlation coefficients between the observed TICKS and the model predictions. The predictions from GLM correlated slightly better with the observed data (r=0.3454) than GLMM (r=0.3442), although the difference was trivial. I guess this is not the standard way to compare the goodness-of-fit, but it is still counter-intuitive to me.
Thus, I don't see how the addition of random effect "BROOD" improved the model fit, indicated by Nakagawa & Schielzeth's (2013) R2m and R2c.
Thank you very much for your kind help!


I wrote "R2m << R2c indicates that the addition of the random effect substantially improves the predictive power of the model." This sounds the same with what happens when an additional fixed effect improves the model fit to the raw data (I admit that's how I understood). But actually R2m << R2c means that the fit of the GLMM model to TICKS data adjusted for the random effect is better than the fit of the GLM model to the raw TICKS data. Is that what you suggested?
– bbKZO Jul 30 '19 at 06:59