1

I'm running a mixed effect logistic regression model (function glmer from the package lme4) in RStudio with two random intercepts ad three different predictors as fixed effects. All the indipendent variables are categorical. However, although the model does not show any convergence problem, there is something strange. One of the categorical predictors (Age) has three different layers (7, 8, 20). When the level of this categorical predictor is 20, all the values of the dependent variable are 1. I've compared contrasts, and while there is significant difference between 7 and 8, no difference appear between 7 and 20, 8 and 20 and 7+8 vs 20. This sounds strange, since when Age=7 there are both 0 and 1 outputs (due to the other predictors), as well as when Ahe=8. So, the difference between 7 and 20 and 8 and 20 or 7+8 vs 20 should be bigger than the one between 7 vs 8. Is there a command that I can add to my model to solve the problem?

I'm just using the optimizer control = glmerControl(optimizer = "bobyqa"). I think this is a case of near-to perfect separation of data, since when one of the predictors assumes a specific level, the dependent variable is always 1.

What are the possible solutions? I have read about Firth penalization, but I am not sure it can be applied to mixed models.

Katherine
  • 165
  • 5
  • I'm not sure what's going on but 1) If you are very new to the world statistics, then I suggest doing much simpler things for quite a while before tackling this sort of thing and 2) If the DV is always 1 when age = 20, weird things are likely. – Peter Flom Mar 20 '24 at 09:37
  • I need this kind of model, so I cannot do different things. And yes, the DV is always 1 when age = 20. What can I do? – Katherine Mar 20 '24 at 16:59
  • If you need help help with modeling your data or interpreting results from statistical models, you should ask for help at [stats.se] instead. You are likely to get better help there. This is not really a specific programming question that's appropriate for Stack Overflow. – MrFlick Mar 20 '24 at 19:14
  • 1
    Welcome to Cross Validated! This might well be related to perfect separation in logistic regression, but it's hard to know without seeing some details. Please edit the question to explain more about the data and your hypothesis, show the package that you used, the function you called, and the summary of the model results. Enter the function call and model summary as text via the "code" tool, rather than as an image. – EdM Mar 20 '24 at 22:27
  • I'm using the glmer function from the package lme4, it is exactly a case of near to perfect separation. The fact is that when my categorical variable Age (which is a fixed effect) assumes the value "adult", all the responses (dependent variable) are 1. But this is okay in my theoretical framework. However, the model shows that the relevant contrast is insignificant (although the variable Age is significant in Analysis of Deviance Table (Type III Wald chisquare tests). I was thinking about adding a Firth correction, but maybe it is okay only for non mixed models (but I'm not sure about this) – Katherine Mar 20 '24 at 22:46
  • And, just to be clear, Age=adult constitute a control group. So, it is okay to have DV to be always 1.I've read about the using of blme for these kinds of problems. However, I wanted to know if these kinds of problems could be solved in a frequentist framework – Katherine Mar 20 '24 at 22:47
  • Here's another approach to it: https://stats.stackexchange.com/a/639550/32477 – Stefan Mar 20 '24 at 23:53
  • Or here: https://stats.stackexchange.com/a/376386/32477 – Stefan Mar 21 '24 at 00:02
  • @Stefan sorry for the probably stupid question, but is it possible to go with Firth or bias-reduced logistic in a mixed model? I've read about the solution to adopt a Bayesian model, but I am not really confident with Bayesian statistics. Of course, I can learn about it (if it is the only possible solution), but if it is possible to adopt a "simpler" solution it could be better. My model is a mixed effect logistic regression model run with the function glmer from the package lme4 – Katherine Mar 21 '24 at 08:31
  • Here is an answer about that, maybe that helps? https://stats.stackexchange.com/a/568285/409566 – Finn Lübber Mar 21 '24 at 08:48
  • It would be helpful to give the full model being fitted in math notation. Also, Bayesian logistic regression tends to handle random effects better. This is implemented in the R brms package for which there are many vignettes. – Frank Harrell Mar 21 '24 at 11:50
  • 1
    @FrankHarrell The mixed effect logistic regression model is the following: model <- glmer(answer ~ 1+ age + realization * type + (1|item) + (1|part), data = data, family = binomial(link = "logit"), control = glmerControl(optimizer = "bobyqa")) – Katherine Mar 21 '24 at 12:05
  • 1
    If you're willing to include a random effect for one grouping factor (not both item and part), then GLMMadaptive can fit a penalized mixed effects logistic regression; see here for an example (with a Poisson model): https://drizopoulos.github.io/GLMMadaptive/articles/GLMMadaptive.html#penalized-mixed-effects-poisson-regression – Dimitris Rizopoulos Mar 21 '24 at 14:14

2 Answers2

1

You may have the following issue with your data. The influence of age=20 (versus age=7 or age=8) is hugh, because all cases have Y=1 for age=20. Consider this example, where "gender" has a similar influence as age=20 in your case; for gender=1 all y values are 1:

gender <- c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
y      <- c(0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1)

model <- glm(y ~ gender, family=binomial) summary(model)

The output is:

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   -0.4055     0.6455  -0.628    0.530
gender        22.9715  4819.6144   0.005    0.996

The std. error of gender is very large relative to its effect 22.9715. However, a logit difference of 22.9715 between the genders is actually hugh, but it is INSIGNIFICANT because of the large standard error. If you would run a likelihood ratio test, you would see that the influence of gender is very significant:

library(car)
Anova(model, type=3)

With output:

Analysis of Deviance Table (Type III tests)

Response: y LR Chisq Df Pr(>Chisq)
gender 33.111 1 8.704e-09 ***

Here you see that the LR test shows a highly Chisq. value. This test is more reliable in this situation. For a good explanation see here

BenP
  • 1,124
  • This is exactly as in my case: Age is really significant with a likelihood ratio test. However, what I need is to compute the contrasts between adult vs age7+age8 (in the output of the model), and through the emmean package the other contrasts, namely adult vs age7, adult vs age8. All these contrasts appear insignificant because of what you have explained. But how can I obtain a significant p-value (I'm pretty sure that the difference is significant, since the one between age7 and age8 is significant). – Katherine Mar 20 '24 at 18:06
  • @Katherine, you can do likelihood ratio tests with Anova(model, type=3) from the car package. But you would then get a chisquare test for all three age categories together instead of 7 versus 20, 8 vs. 20 etc. To do such pairwise tests, you could make dummies yourself for age, instead of using it as a factor. If you then run Anova( ) for the model with the age dummies, you would get sign. p values (if that would be the case). This would not yet give you differences between emmeans of the age categories. I posted a question on CV about LR tests with emmeans, but no answer yet... – BenP Mar 21 '24 at 15:17
  • @Katherine, if nothing else works, a "dirty" solution would be to change 1 value of the dependent into zero for some person with age=20. That would probably make everything significant. I would choose a person who for the other independent variables falls in the modal category. The effects of the other independent variables would probably only change very little. For the comparisons of age=7 and age=8 with age=20 you could then report that the signifcances shown are actually closer to zero (more significant) then shown. Of course this should be briefly documented is in e.g. a footnote. – BenP Mar 22 '24 at 12:35
1

Modeling Age as a continuous predictor instead of treating it as a 3-level category is likely to solve this problem. Categorizing continuous predictors into bins is generally not a good idea, anyway. A restricted cubic spline is a good choice for modeling a continuous predictor flexibly. You would get a continuous estimate of the association between Age and outcome that would nicely handle the consistent values of 1 for outcome in your "adult" category of age. You could then make post-modeling comparisons of the model estimates between any Age values that you want.

EdM
  • 92,183
  • 10
  • 92
  • 267