1

I've read many posts about singular fit issues and this: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#singular-models-random-effect-variances-estimated-as-zero-or-correlations-estimated-as---1 , however I've seen that most of the advice revolves around simplifying random effects. In my case I have just one random intercept that I believe is necessary based on the experimental design so I'm not sure what the best way to proceed is. I am looking at success on a task where the outcome is binary 1=solved, 0= not solved and how other variables contribute to success. I have 13 subjects that each have 10-12 trials so I believe it's important to have the individual as a random intercept when modeling success. The other variables are summarized here:

  • Sex- categorical, 2 levels
  • Zoo- categorical, 2 different zoos
  • Age- continuous
  • Persistence- continuous, proportion of total time engaged with task per trial
  • Neophilia- continuous, latency to approach task in very first trial
  • Diversity- continuous, total number of actions employed per trial.

My full model is newdmod<-glmer(Solved~Sex+Zoo+Age+Persistence+Neophilia+Diversity+(1|Individual),door, family=binomial) (output below) I get a singular fit error with this model and I can see that the variance for my random intercept is almost 0. As I understand it, this would indicate that my random intercept should be removed, but due to my experimental design I think it's important to keep it in. I've also checked for collinearity of this full model and none of the fixed effects have high VIFs.

Generalized linear mixed model fit by maximum likelihood
  (Laplace Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: 
Solved ~ Sex + Zoo + Age + Persistence + Neophilia + Diversity +  
    (1 | Individual)
   Data: door
 AIC      BIC   logLik deviance df.resid 
99.7    123.7    -41.8     83.7      141 

Scaled residuals: Min 1Q Median 3Q Max -4.6817 0.0171 0.0910 0.2962 3.8144

Random effects: Groups Name Variance Std.Dev. Individual (Intercept) 2.774e-18 1.665e-09 Number of obs: 149, groups: Individual, 13

Fixed effects: Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.94544 1.04467 1.862 0.062568 .
SexM 1.68561 0.72139 2.337 0.019459 *
ZooRGZ -1.04102 0.59511 -1.749 0.080240 .
Age -0.04028 0.02218 -1.816 0.069426 .
Persistence 8.19243 1.60335 5.110 3.23e-07 *** Neophilia -0.03757 0.01167 -3.220 0.001283 ** Diversity -0.36305 0.10136 -3.582 0.000341 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) SexM ZooRGZ Age Prsstn Neophl SexM 0.053
ZooRGZ -0.275 -0.095
Age -0.754 -0.219 -0.075
Persistence 0.108 0.356 -0.169 -0.291
Neophilia -0.557 -0.564 -0.081 0.689 -0.443
Diversity -0.508 -0.241 0.260 0.314 -0.648 0.329 optimizer (Nelder_Mead) convergence code: 0 (OK) boundary (singular) fit: see ?isSingular

Interestingly, if I remove one of the fixed variables (Zoo, the one with the highest p value) the singularity problem disappears and the random intercept has variance again.

Generalized linear mixed model fit by maximum likelihood
  (Laplace Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: 
Solved ~ Sex + Age + Persistence + Neophilia + Diversity + (1 |  
    Individual)
   Data: door
 AIC      BIC   logLik deviance df.resid 

100.4 121.5 -43.2 86.4 142

Scaled residuals: Min 1Q Median 3Q Max -5.5278 0.0165 0.0816 0.2958 3.6741

Random effects: Groups Name Variance Std.Dev. Individual (Intercept) 0.3006 0.5483
Number of obs: 149, groups: Individual, 13

Fixed effects: Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.60015 1.16986 1.368 0.171372
SexM 1.64549 0.80376 2.047 0.040634 *
Age -0.04468 0.02510 -1.780 0.075003 .
Persistence 7.89091 1.63103 4.838 1.31e-06 *** Neophilia -0.04018 0.01314 -3.058 0.002230 ** Diversity -0.34100 0.10304 -3.309 0.000935 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) SexM Age Prsstn Neophl SexM -0.008
Age -0.804 -0.199
Persistence 0.023 0.331 -0.284
Neophilia -0.602 -0.525 0.680 -0.435
Diversity -0.466 -0.223 0.323 -0.564 0.336

So my question is whether it is legitimate to simplify the full model by removing a fixed effect based on the p value with the justification that this variable is contributing least to variation in success? Especially since this fixes the singularity issue and given that I would like to keep the random effect. Or could my result with the full model provide enough support to argue that there is not enough individual variation so I don't have to account for repeated measures? Any and all help would be appreciated! I'm also happy to provide more information if it is helpful!

Edit: Adding GLM output for glm(Solved ~ Sex + Zoo + Age + Persistence + Neophilia + Diversity + Individual, family=binomial)

Call:
glm(formula = Solved ~ Sex + Zoo + Age + Persistence + Neophilia + 
    Diversity + Individual, family = binomial, data = door)

Deviance Residuals: Min 1Q Median 3Q Max
-2.42591 0.00001 0.05371 0.34488 2.72956

Coefficients: (4 not defined because of singularities) Estimate Std. Error z value Pr(>|z|)
(Intercept) 60.7768 114.3250 0.532 0.59499
SexM -43.7212 92.9102 -0.471 0.63794
ZooRGZ -63.5269 123.3433 -0.515 0.60652
Age 0.1188 0.3874 0.307 0.75917
Persistence 7.8394 1.9450 4.031 5.56e-05 *** Neophilia -0.6940 1.2860 -0.540 0.58942
Diversity -0.3727 0.1173 -3.178 0.00148 ** IndividualAjay 113.6391 229.9300 0.494 0.62114
IndividualBamboo -65.2868 130.9487 -0.499 0.61808
IndividualBatu 97.5497 195.7502 0.498 0.61825
IndividualChandra -61.2992 121.6868 -0.504 0.61444
IndividualDoc 105.7723 207.0135 0.511 0.60939
IndividualKandula -15.1359 24.6599 -0.614 0.53936
IndividualKirina 20.5416 2647.9499 0.008 0.99381
IndividualMali 20.6377 2600.2857 0.008 0.99367
IndividualRex NA NA NA NA
IndividualRomani NA NA NA NA
IndividualSiri NA NA NA NA
IndividualTarga NA NA NA NA


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 157.574  on 148  degrees of freedom

Residual deviance: 69.857 on 134 degrees of freedom AIC: 99.857

Number of Fisher Scoring iterations: 18

Raw data here: https://github.com/sjacobson1112/Problem-solving/blob/main/doorsuccess.csv

1 Answers1

1

You might be trying to push these data a bit too far.

First, unpenalized logistic regression models typically need about 15 cases in the least-prevalent outcome class per predictor to avoid overfitting. With only 33 trials showing Solved = 0, you should be wary of evaluating more than 2 or 3 predictors. If you count the random-intercept effect as a single predictor, you have 7 predictors in your model. Also, of the 13 Individuals, 4 account for 23 of those 33 trials, 2 have no trials with Solved = 0, and 4 more have only 1 trial each with Solved = 0. Be cautious.

Second, each Individual has a single value of each of Sex, Age, and Neophilia, in addition to Zoo, for all trials.* Removing any one of those predictors from the original full mixed model removes the singularity problem. I can't say precisely why that is the case, but it presumably has to do with the ratio of the number of such predictors (4) to the number (13) of separate individuals. Perhaps once you have identified the associations of those 4 fixed predictors with outcome, there isn't much left to further associate with the Individuals as random effects.

The implications for your questions:

whether it is legitimate to simplify the full model by removing a fixed effect based on the p value ...

You should be evaluating fewer predictors in any event. What you are proposing is similar to backward stepwise selection of predictors, probably the least objectionable of stepwise approaches. It would be better to use your knowledge of the subject matter to choose the most crucial fixed-effect predictors to include from the beginning.

Or could my result with the full model provide enough support to argue that there is not enough individual variation so I don't have to account for repeated measures?

If you perform the no-Zoo model both with the Individual random effect and by completely ignoring the Individual values (in a glm), you get very different results. The magnitudes of the fixed-effect coefficients are about 10 times larger in the no-Zoo mixed model than in a no-Zoo glm model that omits Individual. That raises red flags.

I suspect that this has to do with overfitting. For example, the fixed-effect coefficients of models restricted to Persistence, Neophilia and Diversity are quite similar whether Individual is ignored (with glm) or included as a random-effect intercept (with glmer).

Look carefully at your data and apply your knowledge of the subject matter to decide how best to proceed.


*That's why the last 4 coefficients for levels of Individual in the glm model without random effects are NA. Had you listed Individual as the first predictor in the model formula, then those 4 predictors would have had the NA values instead.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you so much @EdM! This makes a lot of sense. Based on your feedback I will only include the crucial variables of interest for my hypotheses and take out the variables that I wanted to "control for," sex, age, and zoo. Do you think it's best to state that I pared down predictor variables due to singularity issues in a manuscript? Or something like "in order to avoid overfitting the logistic regression model, I eliminated these variables to focus on those of interest" ? It's a bit strange because I included all variables in another part of the analysis where the outcome wasn't binomial. – Sjacobson1112 Mar 09 '21 at 14:52
  • @Sjacobson1112 the usual rule of thumb is less stringent for continuous outcomes, about 15 total cases per predictor. See section 4.4 of Harrell's Regression Modeling Strategies book or course notes. Depending on details of your other analyses, simply citing the risk of overfitting should thus be OK. If your other analyses had lower case/predictor ratios, however, they might also be overfit and you should evaluate, for example, by bootstrapping. The Harrell references point the way. – EdM Mar 09 '21 at 15:55
  • Really appreciate your help @EdM and thank you for the links to the resources! – Sjacobson1112 Mar 09 '21 at 16:36
  • (+1) Nice answer !! @Sjacobson1112 if this answers you question please consider marking it as a the accepted answer (and upvoting)....it's how this site works best. If not, the please let us know why :) – Robert Long Mar 09 '21 at 22:13