Summary: I ran a mixed effects logistic (using glmer) model on binary data (more details below), and I got completely unrealistic estimates. The effect I am interested in (a difference between two experimental conditions, where participants are exposed to both conditions) is very clearly visible to the naked eye, but the estimates of the fixed effects (intercept and slope) are in the opposite direction of what is expected given the data. But if I add the specification nAGQ = 0 or use the glmmADAPTIVE package, I get results that make sense, but with a significantly higher AIC.
More details
Data: 120 participants were each exposed to two different conditions, which were each instantiated in 8 different ways (this is a linguistics experiment, so the same condition can be instantiated with different lexical items). So each participant got 16 trials, gave a binary answer (yes or no) to each.
For the first condition, about 68% of the answers were "yes", while for the second condition it is about 97%, so a very clear difference (each condition was seen 960 times, 8*120). When we look at what participants did, we see that 51 participants said 'yes' less often for the first condition than for the second condition, while there are only 10 participants who did the reverse; we also see that 40 participants said "no" at least half of the time for the first condition, while only 4 did so for the second condition. All of this seems to suggest that the two conditions give rise to different patterns of answers.
Now I ran the following model (maximal random effect structure), which converged [Id = particiapnt, Item= specific trial):
glmer(Response~Cond + (Cond|Id) + (1|Item))
And I got the following:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Resp ~ Cond + (Cond | Id) + (1 | Item)
Data: DataSomeLiteralSecondary
AIC BIC logLik deviance df.resid
730.0 763.3 -359.0 718.0 1914
Scaled residuals:
Min 1Q Median 3Q Max
-3.00786 0.01239 0.01321 0.01539 2.62209
Random effects:
Groups Name Variance Std.Dev. Corr
Id (Intercept) 152.12945 12.334
CondSecondary 195.74841 13.991 -0.91
Item (Intercept) 0.03841 0.196
Number of obs: 1920, groups: Id, 120; Item, 16
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.5354 0.9570 8.919 <2e-16 ***
CondSecondary -0.2646 1.4834 -0.178 0.858
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
CondSecndry -0.616
One can see that the estimates for fixed effects are completely off, by comparing the values provided by using simply glm, which gives results which correspond exactly to the actual log-odds-ratio based on observed frequencies.
The estimates returned by the model above translate into 0.9998036 "yes" for the first condition (vs. 0.68 observed) and essentially the same for the the second condition; but actually a bit below (because the coefficient for the fixed slope is negative)
If instead of glmer, I use instead of glmmTMB, the results are identical. Things do not change in any tangible way if I remove the random intercept for items. Playing with different optimizers does not change anything at all.
However, if I add nAGQ=0, things become much more sensical (then I need to remove the random intercept for items, due to lack of convergence), and this is so too if I use glmmADAPTIVE (then with no random intercept per item either). Finally, as is to be expected, if I drop the random slope from the model, again the model makes much more intuitive sense.
However, all these alternative models are post hoc (and go against what I preregistered), and furthermore they all give rise to a higher AIC and BIC than the model above. I am a bit lost. If I only report the analysis I preregistered (what I actually preregistered is a model comparison with the model that includes only the intercept as a fixed effect, but with the same random effects, and I planed to report the p-value for the LRT obtained from this comparison), then I have to conclude that there is no evidence in the data that the second condition gives rise to more "yes" answers than the first one; but this really seems to go against common sense if we simply look at the data. On the other hand, what justification do I have to move to other models which give rise to more realistic estimates, esp. given that they have a higher AIC and BIC, hence do not seem to fit the data better? Is there is any justification for using nAGQ=0, for instance?
I am aware that my question is related to the one I link to below - Ben Bolker suggested using "Gauss-Hermite quadrature", which I think corresponds to nAGQ=0 (or using the glmADAPTIVE package), which would solve my problem, but what justification can I give for this, given that it's a departure from the preregistration? Is it for instance arguable that because the second condition gives rise to a proportion so close to 1, glmer is not reliable?
Related discussion at: https://stats.stackexchange.com/questions/393477/estimates-radically-change-when-including-random-slopes-in-multiple-logistic-reg
Thanks a lot in advance for enlightening me!
nAGQ=0. I will get back to this/dig in deeper when I get a chance. You should probably useGLMMadaptive(sic): I don't remember iflme4can handle AGHQ for vector-valued random effects such as(Cond|Id)... comparing AICs for different algorithm settings (i.e. number of quadrature points) is not meaningful/sensible ... – Ben Bolker Mar 18 '24 at 01:36