R glmer poisson model and random effect singularity

Question

I am running a Poisson model using glmer to look at the effect of a treatment on fat scores (scale from 1-5, hence the Poisson) of an animal. There are multiple timepoints in which fat scores were taken, so since there are repeated measures, I have a random effect of Animal ID added to the model.

The model looks something like this:

glmer(Fat ~ Treatment + Sex + Timepoint + Sex:Treatment + Timepoint:Treatment + 
      (1 | AnimalID))

However, every single time I try to run the model, it gives me a singularity error. Essentially, it says the random effect's variance is 0. Since I have repeated measures, though, I do have to keep it in the model, correct? I tried finding multiple ways to fix the singularity (taking out individuals that have the same exact fat score throughout the experiment, removing scores that appear to be from one time point, etc.), but nothing is fixing it. The model will still run, but it says the variance is 0, which is puzzling to me. Will anything fix this, or should I just keep the model as is with the singular random effect?

Edit for clarity: I have three different timepoints: one before treatment and two after treatment. There are 2 treatment groups with samples from animals in each treatment group taken during all 3 timepoints (all individuals were sampled 3 times). I have the Timepoint:Treatment interaction in there as I hypothesize that the effect of treatment will be different at each timepoint (ie. there should not be an effect during the first timepoint, but there could be if I happened to have put leaner animals in one treatment vs another). This is the output for this model:

> summary(resultsfat)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: as.integer(FatScore) ~ Timepoint + Sex + Treatment + Sex:Treatment +  
    Timepoint:Treatment + (1 | AnimalID)
   Data: parentdata
 AIC      BIC   logLik deviance df.resid 

421.7    447.6   -201.9    403.7      122
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3137 -0.6158 -0.2114  0.4213  2.0945
Random effects:
 Groups   Name        Variance  Std.Dev. 
 AnimalID (Intercept) 1.738e-16 1.318e-08
Number of obs: 131, groups:  AnimalID, 41
Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)

(Intercept)                 0.5495     0.1831   3.002  0.00268 **
TimepointBtrt1              0.1642     0.2156   0.761  0.44639

TimepointCtrt2              0.5557     0.1993   2.787  0.00531 **
SexM                        0.1287     0.1596   0.807  0.41990

TreatmentS                  0.1545     0.2495   0.619  0.53584

SexM:TreatmentS            -0.1987     0.2370  -0.838  0.40183

TimepointBtrt1:TreatmentS  -0.3985     0.3133  -1.272  0.20334

TimepointCtrt2:TreatmentS  -0.3202     0.2849  -1.124  0.26106

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
            (Intr) TmpnB1 TmpnC2 SexM   TrtmnS SxM:TS TB1:TS
TimpntBtrt1 -0.658

TimpntCtrt2 -0.712  0.597

SexM        -0.485  0.017  0.019

TreatmentS  -0.734  0.483  0.522  0.356

SxM:TrtmntS  0.326 -0.012 -0.013 -0.673 -0.479

TmpntBt1:TS  0.453 -0.688 -0.411 -0.012 -0.613 -0.001

TmpntCt2:TS  0.498 -0.418 -0.700 -0.013 -0.678  0.010  0.536
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
```

I'll post this as a comment because I hope someone gives you a more elaborate answer, but in my understanding no, you don't have to keep the multilevel model if your only random intercept is zero, you can move to a single-level model (regular poisson regression). Another option would be to fit the model using glmmTMB function, which is capable of estimating very small random effects. — Sointu, Oct 11 '23 at 06:40
Have you considered a Bayesian model with a prior on what one believes (possibly a bit more uncertain than that aka "a weakly informative prior") about the size of the between animal SD? If you did that, you would not have to worry about the particular set of data being such as to the maximum likelihood estimate of that SD is 0. Also, this is not really an ideal setting for a Poisson model. A scale from 1-5 misfits that distribution in several ways: 1) limited scale (Poisson is on $[0, \infty)$ and 2) 0 not included. If this is just 1,2,3,4,5, then, why not ordinal logistic regression? — Björn, Oct 11 '23 at 07:45
How many degrees of freedom does your model have? If it is low then the random effects become difficult to compute. Is the Poisson distribution a good model? If it is bad then you get large error terms and the random effects are negligible. Also, the Poisson model will assume a certain variance that may not be realistic (like in this question https://stats.stackexchange.com/questions/628067/). — Sextus Empiricus, Oct 11 '23 at 09:26
Apart from the issue already highlighted (the response is ordinal, so Poisson regression is not appropriate), your data is longitudinal: "There are multiple timepoints in which fat scores were taken." The choice to model the within-animal dependence as (1|AnimalID) is probably sub-optimal also. This has been explained quite a few time here by Prof. Harrell; might as well start with this writeup by him: Longitudinal Data: Think Serial Correlation First, Random Effects Second. — dipetkov, Oct 11 '23 at 13:48
@dipetkov random effects also account for serial correlation, especially if you use random slopes. — Dimitris Rizopoulos, Oct 11 '23 at 15:18
@Dimitris Rizopoulos Well, for one the OP is not using random slopes. And two, depending on how the time points are sampled and what the relationship with time actually is, a random slope might be a very strong assumption. I think we simply don't know enough based on the information the OP has decided to share. — dipetkov, Oct 11 '23 at 15:24
Thank you all for your input! As a biologist, I do not understand the detailed math behind the models, but it does seem that Poisson distribution may not be the best way to run this model. I have never used Bayesian models, but from my understanding, they are predictor models and not necessarily for analyzing differences between treatment groups (but again, as someone who is not a math person, I could be wrong). — bluebird8, Oct 12 '23 at 02:58
To clarify on the timepoints, they were taken based on breeding stage (ie. not a fixed time for every animal). The first timepoint is prior to treatment, and the rest are after treatment (thus I must keep those in the model for consideration). I do expect the effect of treatment at different timepoints to be different. — bluebird8, Oct 12 '23 at 03:00
@bluebird8 how is the time point used in the model? As a factor or as a continuous variable? Could you add the output to the question. ––– The problem with using time as a factor is that, if every animal has different time points, this overfits the data and makes the random effect of little influence. — Sextus Empiricus, Oct 12 '23 at 05:12
You have baseline fat scores as well. Since this score is taken pre-treatment, it is cannot be affected by treatment differences (if any); there may be advantages to using the baseline as a covariate. At this point it seems to me that you've omitted lots of relevant information from your question. You'd have a better chance of getting a detailed, up-to-the-point answer if you make edits. (Or even just ask a new question.) — dipetkov, Oct 12 '23 at 10:55
I apologize for the confusion; I have updated the question for clarity and included the output. Timepoint is a factor, and the Timepoint:Treatment interaction is ensuring that the effect of treatment is different across timepoints. I do need to ensure that I did not happen to pick leaner animals in one treatment or another, so I believe that I need to keep it that way to test my hypotheses. Thank you all again! — bluebird8, Oct 12 '23 at 14:03

score 6 · Answer 1 · answered Oct 11 '23 at 15:18

6

By default, glmer() uses the Laplace approximation to calculate the model's likelihood function, which in some cases may result in such numerical issues. It would be best to switch to using the adaptive Gaussian quadrature in which you can tune the approximation. For more information, check this post.

You can invoke the adaptive Gaussian quadrature in glmer() using the nAGQ argument and setting it to, e.g., 11 or 15. Another package with this capability is GLMMadaptive.

answered Oct 11 '23 at 15:18

Dimitris Rizopoulos

21,024
2
20
47

Thank you for your insight! I am not entirely sure that I understand tuning the approximation, but I will look more into it. Thanks again! – bluebird8 Oct 12 '23 at 03:03

score 4 · Answer 2 · answered Oct 12 '23 at 21:11

One cause for the singular fit may be the decision to model an ordinal variable with Poisson regression.

As @Björn and @SextusEmpiricus point out in the comments, the outcome — fat scores on a scale from 1 to 5 — is an ordinal variable: the scores are ordered categories, from least fat to most fat. It's appropriate to model these scores with ordinal logistic regression. (We use Poisson regression to model counts — number of occurrences — of an event in a fixed time interval.)

Moreover, there are two post-treatment measurements (+ a baseline measurement taken before the treatment is applied) from each animal, so the observations are clustered. This suggests to use a mixed effects model for ordinal outcomes.

Here are two options to fit such a model.

Continuation ratio model with the GLMMadaptive package by @DimitrisRizopoulos. The code to specify the model would be something like this:

fit <- mixed_model(
  Fat ~ Sex * Treatment + Timepoint * Treatment,
  random = ~ 1 | AnimalID,
  family = binomial(),
  data = your_data
)

See also this tutorial: Mixed Models for Ordinal Data.

Bayesian proportional odds model with the rmsb package by @FrankHarrell.

fit <- blrm(
  Fat ~ Sex * Treatment + Timepoint * Treatment + cluster(AnimalID),
  data = your_data
)

See also this section of Regression Modeling Strategies course: Bayesian Proportional Odds Random Effects Model.

Finally, I suggest to consider treating the baseline score as a covariate (on the right side of the model formula) rather than an outcome (on the left side). That is, write the model as:

Fat ~ factor(BaselineFat) + Sex * Treatment + Timepoint * Treatment

after appropriately re-arranging the data. The motivation to do so is that the baseline, which is (likely to be) predictive of the outcome after treatment, is not itself affected by the treatment in a randomized experiment.

See also this thread: Baseline adjustment in mixed models.

R glmer poisson model and random effect singularity

2 Answers2

Linked