1

I'm relatively new to GLMMs and so far only handled relative data. Now I'm trying to model if the abundance of a taxon is affected in the disease state (condition) when considering random effects like repeated sampling (ID), day and cage effects. The abundance data (taxon) is absolute counts of the bacterium derived by qPCR and comparison to a standard curve.

df
 ID cage day condition      taxon

1 100 4 10 healthy 291137072 2 100 4 23 healthy 49370511 3 100 4 27 disease 0 4 100 4 4 healthy 0 5 101 5 10 healthy 255823819 6 101 5 25 healthy 0 7 101 5 26 disease 0 8 101 5 4 healthy 491783963 9 102 6 10 healthy 827020812 10 102 6 29 healthy 1942357597 11 102 6 31 healthy 353989613 12 102 6 4 healthy 265950135 13 103 1 10 healthy 683672045 14 103 1 20 healthy 248822130 15 103 1 20 healthy 330402708 16 103 1 24 disease 72356773 17 103 1 24 disease 0 18 103 1 25 disease 0 19 103 1 4 healthy 758199011 20 104 2 10 healthy 391127953 21 104 2 14 healthy 121840666 22 104 2 24 healthy 428233074 23 104 2 26 healthy 0 24 104 2 26 disease 0 25 104 2 4 healthy 856582541 26 56 1 10 healthy 328956034 27 56 1 20 healthy 243055521 28 56 1 25 healthy 274206537 29 56 1 31 healthy 327404926 30 56 1 4 healthy 0 31 57 2 10 healthy 250707674 32 57 2 14 healthy 105076869 33 57 2 24 healthy 0 34 57 2 26 healthy 25117434 35 57 2 29 healthy 307745763 36 57 2 30 healthy 35323060 37 57 2 30 disease 26061862 38 57 2 31 healthy 236960027 39 57 2 4 healthy 0 40 57 2 7 healthy 548242526 41 58 3 23 healthy 132782429 42 58 3 27 healthy 53354564 43 58 3 28 healthy 109248499 44 58 3 28 disease 172993809 45 58 3 28 disease 71076639 46 58 3 29 healthy 136523472 47 58 3 31 healthy 107758937 48 58 3 4 healthy 418700327 49 59 4 10 healthy 240420771 50 59 4 23 healthy 177600588 51 59 4 27 healthy 0 52 59 4 31 healthy 71662711 53 59 4 4 healthy 0 54 81 1 10 healthy 235286007 55 81 1 19 healthy 0 56 81 1 20 disease 0 57 81 1 4 healthy 0 58 82 2 14 healthy 162675954 59 82 2 14 disease 0 60 82 2 4 healthy 434068852 61 83 3 10 healthy 0 62 83 3 28 healthy 115583049 63 83 3 31 healthy 0 64 83 3 4 healthy 0 65 84 4 10 healthy 0 66 84 4 17 healthy 0 67 84 4 21 healthy 120404542 68 84 4 4 healthy 0 69 85 5 10 healthy 121380422 70 85 5 26 healthy 398575728 71 85 5 31 healthy 49424593 72 85 5 4 healthy 0 73 86 6 10 healthy 589647622 74 86 6 29 disease 88311287 75 86 6 29 disease 131933744 76 86 6 4 healthy 0 77 87 1 10 healthy 46217816 78 87 1 25 healthy 0 79 87 1 31 healthy 319339186 80 87 1 4 healthy 0 81 90 3 10 healthy 277742799 82 90 3 28 disease 164413227 83 90 3 31 healthy 283547803 84 90 3 4 healthy 1331380313 85 91 4 10 healthy 0 86 91 4 23 healthy 0 87 91 4 27 healthy 476684613 88 91 4 31 healthy 384647198 89 91 4 4 healthy 533681749 90 92 5 10 healthy 1495897788 91 92 5 31 healthy 806317700 92 92 5 4 healthy 1700020953 93 93 6 10 healthy 632004192 94 93 6 29 healthy 708512232 95 93 6 31 healthy 2031512652 96 93 6 4 healthy 596088438 97 98 2 10 healthy 328751027 98 98 2 23 healthy 0 99 98 2 24 healthy 0 100 98 2 25 disease 0 101 98 2 26 disease 0 102 98 2 26 healthy 0 103 98 2 27 disease 0 104 98 2 27 disease 0 105 98 2 27 disease 0 106 98 2 29 disease 78682114 107 98 2 29 disease 0 108 98 2 30 disease 0 109 98 2 30 disease 69581995 110 98 2 4 healthy 1619240477 111 99 3 10 healthy 188962800 112 99 3 28 healthy 481068698 113 99 3 31 healthy 572791136 114 99 3 4 healthy 999854899

min(df$taxon) # 0
max(df$taxon) # 2031512652

The order of magnitude is very large, ranging from 0 counts up to 2 million. When I'm running a GLMM with poisson or negative binomial families, as suggested for count data, I get several warnings:

fit_nb <- lme4::glmer.nb(data = df,
                        taxon ~ condition + 
                          (1|day) +(1|cage) + (1|ID))

fit_poisson <- lme4::glmer(data = df_for_taxon, taxon ~ condition + (1|day) +(1|cage) + (1|ID), family = "poisson") summary(fit_nb) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod'] Family: Negative Binomial(871275696) ( log ) Formula: taxon ~ condition + (1 | day) + (1 | cage) + (1 | ID) Data: tmp

    AIC         BIC      logLik    deviance    df.resid 

13191597893 13191597909 -6595798940 13191597881 108

Scaled residuals: Min 1Q Median 3Q Max -16507 -7242 -3110 5548 45807

Random effects: Groups Name Variance Std.Dev. ID (Intercept) 19.0464 4.3642
day (Intercept) 51.7582 7.1943
cage (Intercept) 0.4132 0.6428
Number of obs: 114, groups: ID, 22; day, 17; cage, 6

Fixed effects: Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.562e+01 1.309e-02 1193 <2e-16 *** conditionhealthy 2.059e+00 4.295e-05 47935 <2e-16 ***


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects: (Intr) condtnhlthy 0.000 convergence code: 0 unable to evaluate scaled gradient Model failed to converge: degenerate Hessian with 1 negative eigenvalues

Warning messages: 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.121249 (tol = 0.002, component 1) 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue

  • Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
  • Rescale variables?

3: In theta.ml(Y, mu, weights = object@resp$weights, limit = limit, : iteration limit reached 4: In sqrt(1/i) : NaNs produced 5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.279021 (tol = 0.002, component 1) 6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue

  • Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
  • Rescale variables?

7: In optTheta(g1, interval = interval, tol = tol, verbose = verbose, : Model failed to converge with max|grad| = 0.0490774 (tol = 0.002, component 1) 8: In optTheta(g1, interval = interval, tol = tol, verbose = verbose, : Model is nearly unidentifiable: very large eigenvalue

  • Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
  • Rescale variables?

I think this is due to the huge order of magnitude of my response variable. But if I center and scale the abundance as recommended, I get negative values which are not allowed in GLMMs with poisson or negative binomial.

df$taxon_scaled <- scale(df$taxon)

df ID cage day condition taxon taxon_scaled 1 100 4 10 healthy 291137072 0.039439167 2 100 4 23 healthy 49370511 -0.549483158 3 100 4 27 disease 0 -0.669745432 4 100 4 4 healthy 0 -0.669745432 5 101 5 10 healthy 255823819 -0.046580847 6 101 5 25 healthy 0 -0.669745432 7 101 5 26 disease 0 -0.669745432 8 101 5 4 healthy 491783963 0.528197553 9 102 6 10 healthy 827020812 1.344805374 10 102 6 29 healthy 1942357597 4.061668828 11 102 6 31 healthy 353989613 0.192542494 12 102 6 4 healthy 265950135 -0.021914021 13 103 1 10 healthy 683672045 0.995620238 14 103 1 20 healthy 248822130 -0.063636353 15 103 1 20 healthy 330402708 0.135086843 16 103 1 24 disease 72356773 -0.493490624 17 103 1 24 disease 0 -0.669745432 18 103 1 25 disease 0 -0.669745432 19 103 1 4 healthy 758199011 1.177161449 20 104 2 10 healthy 391127953 0.283008261 21 104 2 14 healthy 121840666 -0.372952160 22 104 2 24 healthy 428233074 0.373393111 23 104 2 26 healthy 0 -0.669745432 24 104 2 26 disease 0 -0.669745432 25 104 2 4 healthy 856582541 1.416815176 26 56 1 10 healthy 328956034 0.131562872 27 56 1 20 healthy 243055521 -0.077683310 28 56 1 25 healthy 274206537 -0.001802143 29 56 1 31 healthy 327404926 0.127784507 30 56 1 4 healthy 0 -0.669745432 31 57 2 10 healthy 250707674 -0.059043331 32 57 2 14 healthy 105076869 -0.413787312 33 57 2 24 healthy 0 -0.669745432 34 57 2 26 healthy 25117434 -0.608561546 35 57 2 29 healthy 307745763 0.079896496 36 57 2 30 healthy 35323060 -0.583701528 37 57 2 30 disease 26061862 -0.606261001 38 57 2 31 healthy 236960027 -0.092531405 39 57 2 4 healthy 0 -0.669745432 40 57 2 7 healthy 548242526 0.665725703 41 58 3 23 healthy 132782429 -0.346298976 42 58 3 27 healthy 53354564 -0.539778352 43 58 3 28 healthy 109248499 -0.403625584 44 58 3 28 disease 172993809 -0.248347550 45 58 3 28 disease 71076639 -0.496608917 46 58 3 29 healthy 136523472 -0.337186121 47 58 3 31 healthy 107758937 -0.407254027 48 58 3 4 healthy 418700327 0.350172169 49 59 4 10 healthy 240420771 -0.084101333 50 59 4 23 healthy 177600588 -0.237125838 51 59 4 27 healthy 0 -0.669745432 52 59 4 31 healthy 71662711 -0.495181298 53 59 4 4 healthy 0 -0.669745432 54 81 1 10 healthy 235286007 -0.096609171 55 81 1 19 healthy 0 -0.669745432 56 81 1 20 disease 0 -0.669745432 57 81 1 4 healthy 0 -0.669745432 58 82 2 14 healthy 162675954 -0.273480948 59 82 2 14 disease 0 -0.669745432 60 82 2 4 healthy 434068852 0.387608558 61 83 3 10 healthy 0 -0.669745432 62 83 3 28 healthy 115583049 -0.388195171 63 83 3 31 healthy 0 -0.669745432 64 83 3 4 healthy 0 -0.669745432 65 84 4 10 healthy 0 -0.669745432 66 84 4 17 healthy 0 -0.669745432 67 84 4 21 healthy 120404542 -0.376450435 68 84 4 4 healthy 0 -0.669745432 69 85 5 10 healthy 121380422 -0.374073274 70 85 5 26 healthy 398575728 0.301150394 71 85 5 31 healthy 49424593 -0.549351421 72 85 5 4 healthy 0 -0.669745432 73 86 6 10 healthy 589647622 0.766584917 74 86 6 29 disease 88311287 -0.454626812 75 86 6 29 disease 131933744 -0.348366299 76 86 6 4 healthy 0 -0.669745432 77 87 1 10 healthy 46217816 -0.557162849 78 87 1 25 healthy 0 -0.669745432 79 87 1 31 healthy 319339186 0.108137066 80 87 1 4 healthy 0 -0.669745432 81 90 3 10 healthy 277742799 0.006811884 82 90 3 28 disease 164413227 -0.269249102 83 90 3 31 healthy 283547803 0.020952368 84 90 3 4 healthy 1331380313 2.573381277 85 91 4 10 healthy 0 -0.669745432 86 91 4 23 healthy 0 -0.669745432 87 91 4 27 healthy 476684613 0.491416848 88 91 4 31 healthy 384647198 0.267221706 89 91 4 4 healthy 533681749 0.630256918 90 92 5 10 healthy 1495897788 2.974131544 91 92 5 31 healthy 806317700 1.294374394 92 92 5 4 healthy 1700020953 3.471357829 93 93 6 10 healthy 632004192 0.869761840 94 93 6 29 healthy 708512232 1.056128776 95 93 6 31 healthy 2031512652 4.278842791 96 93 6 4 healthy 596088438 0.782274187 97 98 2 10 healthy 328751027 0.131063493 98 98 2 23 healthy 0 -0.669745432 99 98 2 24 healthy 0 -0.669745432 100 98 2 25 disease 0 -0.669745432 101 98 2 26 disease 0 -0.669745432 102 98 2 26 healthy 0 -0.669745432 103 98 2 27 disease 0 -0.669745432 104 98 2 27 disease 0 -0.669745432 105 98 2 27 disease 0 -0.669745432 106 98 2 29 disease 78682114 -0.478082642 107 98 2 29 disease 0 -0.669745432 108 98 2 30 disease 0 -0.669745432 109 98 2 30 disease 69581995 -0.500249740 110 98 2 4 healthy 1619240477 3.274583613 111 99 3 10 healthy 188962800 -0.209448478 112 99 3 28 healthy 481068698 0.502096099 113 99 3 31 healthy 572791136 0.725523984 114 99 3 4 healthy 999854899 1.765814186

Question: What is the best way to improve a model with such high orders of magnitude in the count data?

I would appreciate any help, as I think I can't rely on the models so far.

Thanks in advance!

AMF
  • 33
  • If you cenre and scale the response, then it is no longer and count, so poisson or negbin are not appropriate. You could try a regular LMM (ie using lmer, not glmer), though with only 6 cages, I would probably fit fixed effects for that. – Robert Long Jul 05 '21 at 16:44
  • Thank you Robert for your answer! I've adjusted the model to your comments and that solved all warnings. But just to clear things up, when I'm having counts with such a high order of magnitude, I need to center and scale it, right? I've also read threads where this was not recommended in general to some data sets, as it would make interpretations more difficult. But I do not see any other way than rescaling to analyse my data in a mixed effect model. – AMF Jul 07 '21 at 10:54
  • You're welcome. Every situation is different. I wasn't recommending that you rescale. I was just saying that if you DID rescale then poisson or negbin would not be appropriate. Since the numbers are high you could appeal to the normal approximation to the poisson distribution and fit a model with lmer on the unscalled data. This may be of interest: https://stats.stackexchange.com/questions/270/poisson-regression-with-large-data-is-it-wrong-to-change-the-unit-of-measuremen – Robert Long Jul 07 '21 at 12:09
  • Thanks Robert! I had a look into the thread you recommended. I wanted to determine lambda for my data to figure out if the normal approximation can be applied. I found that "λ is the total number of events (k) divided by the number of units (n) in the data (λ = k/n)". As I'm determining the abundance of one taxon in a population, isn't the number of events how often the taxon is found within the population and not the abundance of the taxon within one sample? That would be 75/114=0.65 (75 samples with counts of this taxon and 114 samples in total). Or am I mistaken here? – AMF Jul 08 '21 at 11:49

0 Answers0