I am working with a zero-inflated negative binomial model in R, using the glmmTMB package. My main goal is to investigate if there is a significant difference in the amount of times a grassland field was visited by birds (response variable) during different stages of a mowing procedure (treatment: pre, during or post-procedure. One directly follows the other).
A simple model only with these parameters (below) showed significant differences:
zinb_simple <- glmmTMB(
response ~ treatment
+ (1 | field_id)
+ (1 | bird_id),
data = df_for_analysis,
family = nbinom2,
ziformula = ~.,
offset = offset,
control = glmmTMBControl(
rank_check = "adjust"
)
)
I am now investigating if any extra parameters potentially influence their response to the treatment. I am using three and all of them display the behaviour in question: period_cut (when in the year the treatment happened: early, mid or late); field_size (size of the field, in m2) and rain (rain in mm during the three stages of treatment). Because the focus of the analysis is on the interaction between a parameter and the treatment, I first wrote a model with parameter : treatment:
zinb_full <- glmmTMB(
response ~ treatment
+ period_cut: treatment
+ field_size: treatment
+ rain: treatment
+ (1 | field_id),
data = df_for_analysis,
family = nbinom2,
ziformula = ~.
+ (1 | bird_id),
offset = offset,
control = glmmTMBControl(
rank_check = "adjust"
)
)
bird_id is now only present in zi because the model couldn't converge if bird_id was present in the conditional part. Variance from bird_id in cond was very small (e-08).
This model gave me significancy for some values within the parameters (example from period_cut only for shortness):
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.255e-01 3.867e-01 1.100 0.271205
treatmentduring_treatment 6.215e-01 4.099e-01 1.516 0.129520
treatmentpost_treatment 1.275e+00 4.209e-01 3.031 0.002441 **
treatmentpre_treatment:period_cutearly -9.418e-01 6.638e-01 -1.419 0.155952
treatmentduring_treatment:period_cutearly -1.162e+00 4.043e-01 -2.873 0.004068 **
treatmentpost_treatment:period_cutearly -1.184e+00 4.092e-01 -2.893 0.003813 **
treatmentpre_treatment:period_cutlate -9.891e-01 1.161e+00 -0.852 0.394145
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 9.951e-01 4.884e-01 2.038 0.04159 *
treatmentduring_treatment -3.600e+00 8.301e-01 -4.336 1.45e-05 ***
treatmentpost_treatment -3.161e+00 6.779e-01 -4.663 3.12e-06 ***
treatmentpre_treatment:period_cutearly 5.187e-02 7.610e-01 0.068 0.94566
treatmentduring_treatmentd:period_cutearly 5.120e-01 1.643e+00 0.312 0.75531
treatmentpost_treatment:period_cutearly 5.526e-01 8.171e-01 0.676 0.49886
treatmentpre_treatment:period_cutlate 1.391e+00 1.004e+00 1.386 0.16580
However, all significancy disappears if I use parameter * treatment. The only one that remains is the sole treatment:
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.255e-01 3.867e-01 1.100 0.27121
treatmentduring_treatment 6.215e-01 4.099e-01 1.516 0.12952
treatmentpost_treatment 1.275e+00 4.209e-01 3.031 0.00244 **
period_cutearly -9.418e-01 6.638e-01 -1.419 0.15594
period_cutlate -9.890e-01 1.161e+00 -0.852 0.39419
treatmentduring_treatment:period_cutearly -2.197e-01 7.286e-01 -0.302 0.76296
treatmentpost_treatment:period_cutearly -2.422e-01 7.575e-01 -0.320 0.74913
treatmentduring_treatment:period_cutlate 1.486e+00 1.186e+00 1.253 0.21027
treatmentpost_treatmentd:period_cutlate 1.417e+00 1.196e+00 1.185 0.23618
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 9.951e-01 4.884e-01 2.038 0.0416 *
treatmentduring_treatment -3.600e+00 8.301e-01 -4.336 1.45e-05 ***
treatmentpost_treatment -3.161e+00 6.779e-01 -4.663 3.12e-06 ***
period_cutearly 5.181e-02 7.610e-01 0.068 0.9457
period_cutlate 1.391e+00 1.004e+00 1.386 0.1658
treatmentduring_treatment:period_cutearly 4.604e-01 1.721e+00 0.268 0.7890
treatmentpost_treatment:period_cutearly 5.008e-01 1.042e+00 0.481 0.6308
treatmentduring_treatment:period_cutlate 6.404e-01 1.204e+00 0.532 0.5949
treatmentpost_treatment:period_cutlate -6.233e-01 1.159e+00 -0.538 0.5908
I tried some seeds but have not been able to generate a working minimum reproducible example. It may be that my model is already working with a small dataset and shrinking it further yields in convergence issues. The table below is just an example of how values looks like:
bird_id field_id response treatment period_cut field_size rain offset
<fct> <fct> <dbl> <fct> <fct> <dbl> <dbl> <dbl>
1 koemo3 KM142 0 post_tre… mid -19241. -1.58 1.95
2 sophie AL839 0 post_tre… mid -18716. 5.61 1.39
3 nume7 AL539 3 during_t… early 20483. -0.148 1.95
4 sophie AL1267 0 post_tre… mid -18479. -0.0342 1.95
5 koemo3 KM88 27 during_t… mid -13578. -1.58 1.95
6 koemo3 KM585 3 post_tre… mid 2811. -1.58 1.95
7 koemo3 KM652 0 during_t… mid 16366. -1.58 1.95
8 koemo3 KM184 0 post_tre… late -9154. 0.0230 1.61
9 koemo4 KM492 6 post_tre… mid 5483. -1.58 1.95
10 wiesmet1 AL834 0 post_tre… mid -20968. -1.58 1.95
Continuous variables have been centered and categorical variables have been factorised. The offset is alreadt at log().
Statistics is not my field. I learnt GLMMs about a month ago and my biostatistics professor doesn't work with GLMMs. His guess is that this may be happening because both treatment and period_cut are temporal variables, but that doesn't explain why field_size * treatment behaves the same way. Furthermore, period_cut refers to a period in the year and the treatment happens "within" period_cut, so in a way these temporalities are different. I cannot see a reason for a relationship between the parameters.
Since I cannot offer a reproducible example, could someone offer a theoretical explanation for the observed behaviour? Why does the significance of parameter : treatment disappear with parameter * treatment?