4

I am trying to fit a GLMM for binary data of whether colonies of bees perform mass flight or not. I have time when the mass flight was performed, temperature, location of the hive and species of the bee as fixed effects and Hive ID as random effect. I had a total of 8 hives, two of each species at each location.

'data.frame':   2182 obs. of  7 variables:
 $ Temp        : num  29 29.1 29.8 29 29 29.1 29 30 30 28.3 ...
 $ Zenith_angle: num  74.9 86.9 59.4 76 76 ...
 $ Mass_flight : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ Hive        : Factor w/ 8 levels "JC1","JC2","JM1",..: 6 5 6 7 8 7 6 6 7 5 ...
 $ Site        : Factor w/ 2 levels "Jammu","Trivandrum": 2 2 2 2 2 2 2 2 2 2 ...
 $ Species     : Factor w/ 2 levels "Apis cerana",..: 1 1 1 2 2 2 1 1 2 1 ...
 $ Time        : 'times' num  16:50:00 17:40:00 15:45:00 16:35:00 16:54:00 ...
  ..- attr(*, "format")= chr "h:m:s"
 - attr(*, "na.action")= 'omit' Named int  7 22 23 30 69 378 379 380 381 382 ...
  ..- attr(*, "names")= chr  "7" "22" "23" "30" ...

But it is showing an error of singular fit. How do I correct it?

m4 <- glmer(Mass_flight ~ Time + Temp*Site + Species + (1|Hive), data=data, family = binomial)
 Family: binomial  ( logit )
Formula: Mass_flight ~ Time + Temp * Site + Species + (1 | Hive)
   Data: data
 AIC      BIC   logLik deviance df.resid 

837.8 877.6 -411.9 823.8 2175

Scaled residuals: Min 1Q Median 3Q Max -0.5508 -0.2874 -0.1804 -0.1084 9.4977

Random effects: Groups Name Variance Std.Dev. Hive (Intercept) 2.012e-14 1.419e-07 Number of obs: 2182, groups: Hive, 8

Fixed effects: Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.64885 1.46926 -7.928 2.22e-15 *** Time 8.30280 1.16666 7.117 1.11e-12 *** Temp 0.16538 0.04572 3.617 0.000298 *** SiteTrivandrum 6.43244 2.79463 2.302 0.021351 *
SpeciesApis mellifera -0.47795 0.21000 -2.276 0.022849 *
Temp:SiteTrivandrum -0.25244 0.09400 -2.685 0.007243 **


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) Time Temp StTrvn SpcsAm Time -0.569
Temp -0.868 0.110
SiteTrvndrm -0.230 -0.229 0.408
SpcsApsmllf 0.152 -0.058 -0.266 -0.088
Tmp:StTrvnd 0.283 0.193 -0.461 -0.995 0.123 convergence code: 0 boundary (singular) fit: see ?isSingular

Robert Long
  • 60,630
Awanti
  • 41
  • 1
    Try using the Nelder-Mead solver. I think its an option under glmmControl. Call ?glmer and read the documentation. – Demetri Pananos Feb 18 '20 at 16:25
  • Related https://stats.stackexchange.com/questions/35071/what-is-rank-deficiency-and-how-to-deal-with-it/151116#151116 – AdamO Feb 18 '20 at 16:28
  • 1
    I would also suggest running the model with one of the many other GLMM packages, including glmmTMB and GLMMAdaptive. – Erik Ruzek Feb 18 '20 at 18:15

2 Answers2

5

It is worth stepping back a little and thinking about what might be going on here.

There are two possibilities:

  1. The variance of the random intercept for Hive is effectively zero

  2. The variance of the random intercept for Hive is different from zero in a meaningful way.

In the case of 1, it is not necessary to fit random intercepts at all. You can follow the advice in @AdamO's answer, and you will probably find that the estimates for the fixed effects, that you are interested in, will be the same as in your mixed model

If the case of 2, then either either you can follow the same advice and just with a glm with fixed effects for Hive or you can try to diagnose why you are getting a singular fit.

Since we don't know a prioiri whether the situation is 1, or 2, I would suggest trying to diagnose the problem with the following procedure:

  1. In lme4 change the optimizer and the optimization options in lmerControl. If thise does not solve the problem, then
  2. Use other packages, in particular GLMMadaptive and rstanarm and if this does not solve the problem, then
  3. Since the clusters are large, split the data into smaller random samples. I would suggest starting with 5 random samples, each of size $\frac{1}{5}$ of the original sample. If you still obtain a singular fit, or the random intercept variance is low, then you can conclude that there really is no correlation within Hive and just fit a glm both with and without fixed effects for Hive. If the inference is the same in both models, choose the model without Hive or do a likelihood ratio test. If you find that you are getting meaningful estimates for the random intercept variance with the sub-sampled datasets, then you may have found a very interesting dataset, and your best way forward is to fit a glm with fixed effects for Hive.
Robert Long
  • 60,630
  • If I change the optimization control, I still have a singular fit. The previous model: glmer(Mass_flight ~ Time_n + Temp*Site + Species + (1 | Hive) and one using glmmAdaptive mixed_model(Mass_flight ~ Time_n + Temp*Site + Species, ~ 1 | Hive, data=data, family=binomial, control = list(iter_EM=0)) and according to @AdamO glms excluding Hive glm(Mass_flight ~ Time_n + Temp*Site + Species, data=data, family = binomial) and with Hive have almost equal estimates for the fixed effects. Is it better to go with a glm excluding Hive in that case? It has a lower AIC than the other glm. Thanks – Awanti Feb 20 '20 at 05:25
  • @Awanti Yes it would seem that there is no variation between hives so a him without the fixed effect of Hive is more parsimonious. If this is going in a paper then you should outline this procedure. Also, did you try splitting the dataset ? – Robert Long Feb 20 '20 at 08:02
  • Yes. Sorry I didn't mention it earlier. As you said, I got 5 random samples from the dataset of 1/5 size. I got a singular fit for all of them. – Awanti Feb 20 '20 at 08:54
2

With 2,182 observations but only 8 levels of hive, hive will not behave well as a random intercept. You should not use a mixed model. Instead, adjust for hive as a fixed effect in a GLM. Theoretically achieves the same thing and is far more stable.

AdamO
  • 62,637