I've read many posts about singular fit issues and this: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1 , however I've seen that most of the advice revolves around simplifying random effects. In my case I have just one random intercept that I believe is necessary based on the experimental design so I'm not sure what the best way to proceed is. I am looking at success on a task where the outcome is binary 1=solved, 0= not solved and how other variables contribute to success. I have 13 subjects that each have 10-12 trials so I believe it's important to have the individual as a random intercept when modeling success. The other variables are summarized here:
- Sex- categorical, 2 levels
- Zoo- categorical, 2 different zoos
- Age- continuous
- Persistence- continuous, proportion of total time engaged with task per trial
- Neophilia- continuous, latency to approach task in very first trial
- Diversity- continuous, total number of actions employed per trial.
My full model is newdmod<-glmer(Solved~Sex+Zoo+Age+Persistence+Neophilia+Diversity+(1|Individual),door, family=binomial) (output below) I get a singular fit error with this model and I can see that the variance for my random intercept is almost 0. As I understand it, this would indicate that my random intercept should be removed, but due to my experimental design I think it's important to keep it in. I've also checked for collinearity of this full model and none of the fixed effects have high VIFs.
Generalized linear mixed model fit by maximum likelihood
(Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula:
Solved ~ Sex + Zoo + Age + Persistence + Neophilia + Diversity +
(1 | Individual)
Data: door
AIC BIC logLik deviance df.resid
99.7 123.7 -41.8 83.7 141
Scaled residuals:
Min 1Q Median 3Q Max
-4.6817 0.0171 0.0910 0.2962 3.8144
Random effects:
Groups Name Variance Std.Dev.
Individual (Intercept) 2.774e-18 1.665e-09
Number of obs: 149, groups: Individual, 13
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.94544 1.04467 1.862 0.062568 .
SexM 1.68561 0.72139 2.337 0.019459 *
ZooRGZ -1.04102 0.59511 -1.749 0.080240 .
Age -0.04028 0.02218 -1.816 0.069426 .
Persistence 8.19243 1.60335 5.110 3.23e-07 ***
Neophilia -0.03757 0.01167 -3.220 0.001283 **
Diversity -0.36305 0.10136 -3.582 0.000341 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SexM ZooRGZ Age Prsstn Neophl
SexM 0.053
ZooRGZ -0.275 -0.095
Age -0.754 -0.219 -0.075
Persistence 0.108 0.356 -0.169 -0.291
Neophilia -0.557 -0.564 -0.081 0.689 -0.443
Diversity -0.508 -0.241 0.260 0.314 -0.648 0.329
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
Interestingly, if I remove one of the fixed variables (Zoo, the one with the highest p value) the singularity problem disappears and the random intercept has variance again.
Generalized linear mixed model fit by maximum likelihood
(Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula:
Solved ~ Sex + Age + Persistence + Neophilia + Diversity + (1 |
Individual)
Data: door
AIC BIC logLik deviance df.resid
100.4 121.5 -43.2 86.4 142
Scaled residuals:
Min 1Q Median 3Q Max
-5.5278 0.0165 0.0816 0.2958 3.6741
Random effects:
Groups Name Variance Std.Dev.
Individual (Intercept) 0.3006 0.5483
Number of obs: 149, groups: Individual, 13
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.60015 1.16986 1.368 0.171372
SexM 1.64549 0.80376 2.047 0.040634 *
Age -0.04468 0.02510 -1.780 0.075003 .
Persistence 7.89091 1.63103 4.838 1.31e-06 ***
Neophilia -0.04018 0.01314 -3.058 0.002230 **
Diversity -0.34100 0.10304 -3.309 0.000935 ***
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SexM Age Prsstn Neophl
SexM -0.008
Age -0.804 -0.199
Persistence 0.023 0.331 -0.284
Neophilia -0.602 -0.525 0.680 -0.435
Diversity -0.466 -0.223 0.323 -0.564 0.336
So my question is whether it is legitimate to simplify the full model by removing a fixed effect based on the p value with the justification that this variable is contributing least to variation in success? Especially since this fixes the singularity issue and given that I would like to keep the random effect. Or could my result with the full model provide enough support to argue that there is not enough individual variation so I don't have to account for repeated measures? Any and all help would be appreciated! I'm also happy to provide more information if it is helpful!
Edit: Adding GLM output for glm(Solved ~ Sex + Zoo + Age + Persistence + Neophilia + Diversity + Individual, family=binomial)
Call:
glm(formula = Solved ~ Sex + Zoo + Age + Persistence + Neophilia +
Diversity + Individual, family = binomial, data = door)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.42591 0.00001 0.05371 0.34488 2.72956
Coefficients: (4 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 60.7768 114.3250 0.532 0.59499
SexM -43.7212 92.9102 -0.471 0.63794
ZooRGZ -63.5269 123.3433 -0.515 0.60652
Age 0.1188 0.3874 0.307 0.75917
Persistence 7.8394 1.9450 4.031 5.56e-05 ***
Neophilia -0.6940 1.2860 -0.540 0.58942
Diversity -0.3727 0.1173 -3.178 0.00148 **
IndividualAjay 113.6391 229.9300 0.494 0.62114
IndividualBamboo -65.2868 130.9487 -0.499 0.61808
IndividualBatu 97.5497 195.7502 0.498 0.61825
IndividualChandra -61.2992 121.6868 -0.504 0.61444
IndividualDoc 105.7723 207.0135 0.511 0.60939
IndividualKandula -15.1359 24.6599 -0.614 0.53936
IndividualKirina 20.5416 2647.9499 0.008 0.99381
IndividualMali 20.6377 2600.2857 0.008 0.99367
IndividualRex NA NA NA NA
IndividualRomani NA NA NA NA
IndividualSiri NA NA NA NA
IndividualTarga NA NA NA NA
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 157.574 on 148 degrees of freedom
Residual deviance: 69.857 on 134 degrees of freedom
AIC: 99.857
Number of Fisher Scoring iterations: 18
Raw data here: https://github.com/sjacobson1112/Problem-solving/blob/main/doorsuccess.csv
glm(Solved ~ Sex + Zoo + Age + Persistence + Neophilia + Diversity + Individual, family=binomial)Also, please explain what all the variables are. – Robert Long Mar 04 '21 at 20:11Zoovariable ? Also, you didn't includeIndividualin theglmmodel. – Robert Long Mar 04 '21 at 20:32