How to handle big count data with huge orders of magnitude in GLMMs: center & scaling but than negative values are introduced?

Question

I'm relatively new to GLMMs and so far only handled relative data. Now I'm trying to model if the abundance of a taxon is affected in the disease state (condition) when considering random effects like repeated sampling (ID), day and cage effects. The abundance data (taxon) is absolute counts of the bacterium derived by qPCR and comparison to a standard curve.

df
 ID cage day condition      taxon

1   100    4  10   healthy  291137072
2   100    4  23   healthy   49370511
3   100    4  27   disease          0
4   100    4   4   healthy          0
5   101    5  10   healthy  255823819
6   101    5  25   healthy          0
7   101    5  26   disease          0
8   101    5   4   healthy  491783963
9   102    6  10   healthy  827020812
10  102    6  29   healthy 1942357597
11  102    6  31   healthy  353989613
12  102    6   4   healthy  265950135
13  103    1  10   healthy  683672045
14  103    1  20   healthy  248822130
15  103    1  20   healthy  330402708
16  103    1  24   disease   72356773
17  103    1  24   disease          0
18  103    1  25   disease          0
19  103    1   4   healthy  758199011
20  104    2  10   healthy  391127953
21  104    2  14   healthy  121840666
22  104    2  24   healthy  428233074
23  104    2  26   healthy          0
24  104    2  26   disease          0
25  104    2   4   healthy  856582541
26   56    1  10   healthy  328956034
27   56    1  20   healthy  243055521
28   56    1  25   healthy  274206537
29   56    1  31   healthy  327404926
30   56    1   4   healthy          0
31   57    2  10   healthy  250707674
32   57    2  14   healthy  105076869
33   57    2  24   healthy          0
34   57    2  26   healthy   25117434
35   57    2  29   healthy  307745763
36   57    2  30   healthy   35323060
37   57    2  30   disease   26061862
38   57    2  31   healthy  236960027
39   57    2   4   healthy          0
40   57    2   7   healthy  548242526
41   58    3  23   healthy  132782429
42   58    3  27   healthy   53354564
43   58    3  28   healthy  109248499
44   58    3  28   disease  172993809
45   58    3  28   disease   71076639
46   58    3  29   healthy  136523472
47   58    3  31   healthy  107758937
48   58    3   4   healthy  418700327
49   59    4  10   healthy  240420771
50   59    4  23   healthy  177600588
51   59    4  27   healthy          0
52   59    4  31   healthy   71662711
53   59    4   4   healthy          0
54   81    1  10   healthy  235286007
55   81    1  19   healthy          0
56   81    1  20   disease          0
57   81    1   4   healthy          0
58   82    2  14   healthy  162675954
59   82    2  14   disease          0
60   82    2   4   healthy  434068852
61   83    3  10   healthy          0
62   83    3  28   healthy  115583049
63   83    3  31   healthy          0
64   83    3   4   healthy          0
65   84    4  10   healthy          0
66   84    4  17   healthy          0
67   84    4  21   healthy  120404542
68   84    4   4   healthy          0
69   85    5  10   healthy  121380422
70   85    5  26   healthy  398575728
71   85    5  31   healthy   49424593
72   85    5   4   healthy          0
73   86    6  10   healthy  589647622
74   86    6  29   disease   88311287
75   86    6  29   disease  131933744
76   86    6   4   healthy          0
77   87    1  10   healthy   46217816
78   87    1  25   healthy          0
79   87    1  31   healthy  319339186
80   87    1   4   healthy          0
81   90    3  10   healthy  277742799
82   90    3  28   disease  164413227
83   90    3  31   healthy  283547803
84   90    3   4   healthy 1331380313
85   91    4  10   healthy          0
86   91    4  23   healthy          0
87   91    4  27   healthy  476684613
88   91    4  31   healthy  384647198
89   91    4   4   healthy  533681749
90   92    5  10   healthy 1495897788
91   92    5  31   healthy  806317700
92   92    5   4   healthy 1700020953
93   93    6  10   healthy  632004192
94   93    6  29   healthy  708512232
95   93    6  31   healthy 2031512652
96   93    6   4   healthy  596088438
97   98    2  10   healthy  328751027
98   98    2  23   healthy          0
99   98    2  24   healthy          0
100  98    2  25   disease          0
101  98    2  26   disease          0
102  98    2  26   healthy          0
103  98    2  27   disease          0
104  98    2  27   disease          0
105  98    2  27   disease          0
106  98    2  29   disease   78682114
107  98    2  29   disease          0
108  98    2  30   disease          0
109  98    2  30   disease   69581995
110  98    2   4   healthy 1619240477
111  99    3  10   healthy  188962800
112  99    3  28   healthy  481068698
113  99    3  31   healthy  572791136
114  99    3   4   healthy  999854899

min(df$taxon) # 0
max(df$taxon) # 2031512652

The order of magnitude is very large, ranging from 0 counts up to 2 million. When I'm running a GLMM with poisson or negative binomial families, as suggested for count data, I get several warnings:

fit_nb <- lme4::glmer.nb(data = df,
                        taxon ~ condition + 
                          (1|day) +(1|cage) + (1|ID))
fit_poisson <- lme4::glmer(data = df_for_taxon,
                        taxon ~ condition + 
                          (1|day) +(1|cage) + (1|ID),
                         family = "poisson")
summary(fit_nb)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: Negative Binomial(871275696)  ( log )
Formula: taxon ~ condition + (1 | day) + (1 | cage) + (1 | ID)
   Data: tmp
    AIC         BIC      logLik    deviance    df.resid 

13191597893 13191597909 -6595798940 13191597881         108
Scaled residuals: 
   Min     1Q Median     3Q    Max 
-16507  -7242  -3110   5548  45807
Random effects:
 Groups Name        Variance Std.Dev.
 ID     (Intercept) 19.0464  4.3642

 day    (Intercept) 51.7582  7.1943

 cage   (Intercept)  0.4132  0.6428

Number of obs: 114, groups:  ID, 22; day, 17; cage, 6
Fixed effects:
                  Estimate Std. Error z value Pr(>|z|)

(Intercept)      1.562e+01  1.309e-02    1193   <2e-16 ***
conditionhealthy 2.059e+00  4.295e-05   47935   <2e-16 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
            (Intr)
condtnhlthy 0.000 
convergence code: 0
unable to evaluate scaled gradient
Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.121249 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue

Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
Rescale variables?

3: In theta.ml(Y, mu, weights = object@resp$weights, limit = limit,  :
  iteration limit reached
4: In sqrt(1/i) : NaNs produced
5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.279021 (tol = 0.002, component 1)
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue

Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
Rescale variables?

7: In optTheta(g1, interval = interval, tol = tol, verbose = verbose,  :
  Model failed to converge with max|grad| = 0.0490774 (tol = 0.002, component 1)
8: In optTheta(g1, interval = interval, tol = tol, verbose = verbose,  :
  Model is nearly unidentifiable: very large eigenvalue

Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
Rescale variables?

I think this is due to the huge order of magnitude of my response variable. But if I center and scale the abundance as recommended, I get negative values which are not allowed in GLMMs with poisson or negative binomial.

df$taxon_scaled <- scale(df$taxon)
df
     ID cage day condition      taxon taxon_scaled
1   100    4  10   healthy  291137072  0.039439167
2   100    4  23   healthy   49370511 -0.549483158
3   100    4  27   disease          0 -0.669745432
4   100    4   4   healthy          0 -0.669745432
5   101    5  10   healthy  255823819 -0.046580847
6   101    5  25   healthy          0 -0.669745432
7   101    5  26   disease          0 -0.669745432
8   101    5   4   healthy  491783963  0.528197553
9   102    6  10   healthy  827020812  1.344805374
10  102    6  29   healthy 1942357597  4.061668828
11  102    6  31   healthy  353989613  0.192542494
12  102    6   4   healthy  265950135 -0.021914021
13  103    1  10   healthy  683672045  0.995620238
14  103    1  20   healthy  248822130 -0.063636353
15  103    1  20   healthy  330402708  0.135086843
16  103    1  24   disease   72356773 -0.493490624
17  103    1  24   disease          0 -0.669745432
18  103    1  25   disease          0 -0.669745432
19  103    1   4   healthy  758199011  1.177161449
20  104    2  10   healthy  391127953  0.283008261
21  104    2  14   healthy  121840666 -0.372952160
22  104    2  24   healthy  428233074  0.373393111
23  104    2  26   healthy          0 -0.669745432
24  104    2  26   disease          0 -0.669745432
25  104    2   4   healthy  856582541  1.416815176
26   56    1  10   healthy  328956034  0.131562872
27   56    1  20   healthy  243055521 -0.077683310
28   56    1  25   healthy  274206537 -0.001802143
29   56    1  31   healthy  327404926  0.127784507
30   56    1   4   healthy          0 -0.669745432
31   57    2  10   healthy  250707674 -0.059043331
32   57    2  14   healthy  105076869 -0.413787312
33   57    2  24   healthy          0 -0.669745432
34   57    2  26   healthy   25117434 -0.608561546
35   57    2  29   healthy  307745763  0.079896496
36   57    2  30   healthy   35323060 -0.583701528
37   57    2  30   disease   26061862 -0.606261001
38   57    2  31   healthy  236960027 -0.092531405
39   57    2   4   healthy          0 -0.669745432
40   57    2   7   healthy  548242526  0.665725703
41   58    3  23   healthy  132782429 -0.346298976
42   58    3  27   healthy   53354564 -0.539778352
43   58    3  28   healthy  109248499 -0.403625584
44   58    3  28   disease  172993809 -0.248347550
45   58    3  28   disease   71076639 -0.496608917
46   58    3  29   healthy  136523472 -0.337186121
47   58    3  31   healthy  107758937 -0.407254027
48   58    3   4   healthy  418700327  0.350172169
49   59    4  10   healthy  240420771 -0.084101333
50   59    4  23   healthy  177600588 -0.237125838
51   59    4  27   healthy          0 -0.669745432
52   59    4  31   healthy   71662711 -0.495181298
53   59    4   4   healthy          0 -0.669745432
54   81    1  10   healthy  235286007 -0.096609171
55   81    1  19   healthy          0 -0.669745432
56   81    1  20   disease          0 -0.669745432
57   81    1   4   healthy          0 -0.669745432
58   82    2  14   healthy  162675954 -0.273480948
59   82    2  14   disease          0 -0.669745432
60   82    2   4   healthy  434068852  0.387608558
61   83    3  10   healthy          0 -0.669745432
62   83    3  28   healthy  115583049 -0.388195171
63   83    3  31   healthy          0 -0.669745432
64   83    3   4   healthy          0 -0.669745432
65   84    4  10   healthy          0 -0.669745432
66   84    4  17   healthy          0 -0.669745432
67   84    4  21   healthy  120404542 -0.376450435
68   84    4   4   healthy          0 -0.669745432
69   85    5  10   healthy  121380422 -0.374073274
70   85    5  26   healthy  398575728  0.301150394
71   85    5  31   healthy   49424593 -0.549351421
72   85    5   4   healthy          0 -0.669745432
73   86    6  10   healthy  589647622  0.766584917
74   86    6  29   disease   88311287 -0.454626812
75   86    6  29   disease  131933744 -0.348366299
76   86    6   4   healthy          0 -0.669745432
77   87    1  10   healthy   46217816 -0.557162849
78   87    1  25   healthy          0 -0.669745432
79   87    1  31   healthy  319339186  0.108137066
80   87    1   4   healthy          0 -0.669745432
81   90    3  10   healthy  277742799  0.006811884
82   90    3  28   disease  164413227 -0.269249102
83   90    3  31   healthy  283547803  0.020952368
84   90    3   4   healthy 1331380313  2.573381277
85   91    4  10   healthy          0 -0.669745432
86   91    4  23   healthy          0 -0.669745432
87   91    4  27   healthy  476684613  0.491416848
88   91    4  31   healthy  384647198  0.267221706
89   91    4   4   healthy  533681749  0.630256918
90   92    5  10   healthy 1495897788  2.974131544
91   92    5  31   healthy  806317700  1.294374394
92   92    5   4   healthy 1700020953  3.471357829
93   93    6  10   healthy  632004192  0.869761840
94   93    6  29   healthy  708512232  1.056128776
95   93    6  31   healthy 2031512652  4.278842791
96   93    6   4   healthy  596088438  0.782274187
97   98    2  10   healthy  328751027  0.131063493
98   98    2  23   healthy          0 -0.669745432
99   98    2  24   healthy          0 -0.669745432
100  98    2  25   disease          0 -0.669745432
101  98    2  26   disease          0 -0.669745432
102  98    2  26   healthy          0 -0.669745432
103  98    2  27   disease          0 -0.669745432
104  98    2  27   disease          0 -0.669745432
105  98    2  27   disease          0 -0.669745432
106  98    2  29   disease   78682114 -0.478082642
107  98    2  29   disease          0 -0.669745432
108  98    2  30   disease          0 -0.669745432
109  98    2  30   disease   69581995 -0.500249740
110  98    2   4   healthy 1619240477  3.274583613
111  99    3  10   healthy  188962800 -0.209448478
112  99    3  28   healthy  481068698  0.502096099
113  99    3  31   healthy  572791136  0.725523984
114  99    3   4   healthy  999854899  1.765814186

Question: What is the best way to improve a model with such high orders of magnitude in the count data?

I would appreciate any help, as I think I can't rely on the models so far.

Thanks in advance!

If you cenre and scale the response, then it is no longer and count, so poisson or negbin are not appropriate. You could try a regular LMM (ie using lmer, not glmer), though with only 6 cages, I would probably fit fixed effects for that. — Robert Long, Jul 05 '21 at 16:44
Thank you Robert for your answer! I've adjusted the model to your comments and that solved all warnings. But just to clear things up, when I'm having counts with such a high order of magnitude, I need to center and scale it, right? I've also read threads where this was not recommended in general to some data sets, as it would make interpretations more difficult. But I do not see any other way than rescaling to analyse my data in a mixed effect model. — AMF, Jul 07 '21 at 10:54
You're welcome. Every situation is different. I wasn't recommending that you rescale. I was just saying that if you DID rescale then poisson or negbin would not be appropriate. Since the numbers are high you could appeal to the normal approximation to the poisson distribution and fit a model with lmer on the unscalled data. This may be of interest: https://stats.stackexchange.com/questions/270/poisson-regression-with-large-data-is-it-wrong-to-change-the-unit-of-measuremen — Robert Long, Jul 07 '21 at 12:09
Thanks Robert! I had a look into the thread you recommended. I wanted to determine lambda for my data to figure out if the normal approximation can be applied. I found that "λ is the total number of events (k) divided by the number of units (n) in the data (λ = k/n)". As I'm determining the abundance of one taxon in a population, isn't the number of events how often the taxon is found within the population and not the abundance of the taxon within one sample? That would be 75/114=0.65 (75 samples with counts of this taxon and 114 samples in total). Or am I mistaken here? — AMF, Jul 08 '21 at 11:49

How to handle big count data with huge orders of magnitude in GLMMs: center & scaling but than negative values are introduced?

0 Answers0