Seeking feedback on the interpretation and reporting of a glmm output

Question

This might be a very beginner level question. I have fitted a model with glmmTMB R package, considering the interactive effect of mean wind speed and average daily rain as well as an interactive effect of mean temperature and mean relative humidity on plant disease severity. Here is my fitted model

mod1 <-
  glmmTMB(disease_severity ~ mean_rh * mean_temp + mean_ws * avg_daily_rain + (1|year),
          family = beta_family(), data = dat_seasonal)
summary(mod1)

Here is the model output.

Formula:          disease_severity ~ mean_rh * mean_temp + mean_ws * avg_daily_rain +  
    (1 | year)
Data: dat_seasonal
 AIC      BIC   logLik deviance df.resid 

-41.9    -27.4     29.9    -59.9       28
Random effects:
Conditional model:
 Groups Name        Variance Std.Dev.
 year   (Intercept) 0.7874   0.8874

Number of obs: 37, groups:  year, 11
Dispersion parameter for beta family (): 7.03
Conditional model:
                        Estimate Std. Error z value Pr(>|z|)

(Intercept)            -336.9044   151.8930  -2.218   0.0266 *
mean_rh                   4.3731     1.9177   2.280   0.0226 *
mean_temp                22.8305     9.9872   2.286   0.0223 *
mean_ws                 -19.8944    11.1741  -1.780   0.0750 .
avg_daily_rain           -3.7751     2.5388  -1.487   0.1370

mean_rh:mean_temp        -0.2854     0.1258  -2.269   0.0232 *
mean_ws:avg_daily_rain    5.0808     2.3155   2.194   0.0282 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

For reporting,

I would like to report my model results. Can I say that the predictors mean_rh and mean_temp had a significant positive effect individual effect (effect on their own). The predictors mean_ws and avg_daily_rain had no significant effect on their own? There was a signficant negative interaction effect of mean_rh & mean_temp as well as a significant positive interaction effect of mean_ws and avg_daily_rain.

For plotting,

I just need to plot interactive terms of my model. Plotting individual effects alone would be misleading. For example, if I'm using ggeffects package.

Figure 1 <- plot(ggpredict(mod1, c("mean_rh", "mean_temp"))) +
  theme_pubclean() +
  labs(x = "Mean relative humidity", title = "")
Figure 2 <- plot(ggpredict(mod1, c("mean_ws", "avg_daily_rain"))) +
  theme_pubclean() + 
  labs(x = "Mean wind speed", title = "")

In summary, it's okay to report individual as well as interactive effects (main & interaction effect). But it is NOT okay to plot main effects alone in my case. Is that right?

Edit: Thank you very much for the feedback @EdM. I am editing the question based on the test you suggested, so all can benefit. I ran car::Anova(mod1) on my model and I got the following output.

Analysis of Deviance Table (Type II Wald chisquare tests)
Response: disease_severity
                        Chisq Df Pr(>Chisq)

mean_rh                0.0535  1   0.817057

mean_temp              0.2366  1   0.626650

mean_ws                1.1419  1   0.285252

avg_daily_rain         9.8801  1   0.001671 **
mean_rh:mean_temp      5.1498  1   0.023249 * 
mean_ws:avg_daily_rain 4.8148  1   0.028217 *

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This output suggest significant interaction effect of the interactive terms as shown in the original model above. However, it doesn't give a positive or a negative sign, but I assume there is a significant negative interaction effect of mean_rh:mean_temp and a significant positive effect of mean_ws:avg_daily_rain as shown in the original model above? So if need to report this, I'd report that there was a positive main effect of avg_daily_rain and significant interaction effect of mean_rh:mean_temp & mean_ws:avg_daily_rain. Is that right?

Regarding overfitting, I fitted the same model (but excluded random effect of year) using beta regression as shown below.

mod2 <-
  betareg(disease_severity ~ mean_rh * mean_temp + mean_ws * avg_daily_rain,
          data = dat_seasonal)
summary(mod2)

The output of interactive effects is the same as mod1 (above model with random effect)

Call:
betareg(formula = disease_severity ~ mean_rh * mean_temp + mean_ws * avg_daily_rain, 
    data = dat_seasonal)
Standardized weighted residuals 2:
    Min      1Q  Median      3Q     Max 
-1.8117 -0.9641 -0.0679  0.9726  1.8027
Coefficients (mean model with logit link):
                         Estimate Std. Error z value Pr(>|z|)

(Intercept)            -439.66169   87.71840  -5.012 5.38e-07 ***
mean_rh                   5.65518    1.11886   5.054 4.32e-07 ***
mean_temp                29.66389    5.84693   5.073 3.91e-07 ***
mean_ws                 -23.39753    6.14241  -3.809 0.000139 ***
avg_daily_rain           -4.89648    1.40426  -3.487 0.000489 ***
mean_rh:mean_temp        -0.36917    0.07362  -5.015 5.31e-07 ***
mean_ws:avg_daily_rain    6.39018    1.31982   4.842 1.29e-06 ***
Phi coefficients (precision model with identity link):
      Estimate Std. Error z value Pr(>|z|)

(phi)   3.7717     0.8953   4.213 2.52e-05 ***

Signif. codes:  0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 24.96 on 8 Df
Pseudo R-squared: 0.5381
Number of iterations: 358 (BFGS) + 56 (Fisher scoring)

When I ran car::Anova(mod2). I got the following output

Analysis of Deviance Table (Type II tests)
Response: disease_severity
                       Df   Chisq Pr(>Chisq)

mean_rh                 1  0.5807   0.446035

mean_temp               1  3.7047   0.054259 .

mean_ws                 1  9.0713   0.002597 ** 
avg_daily_rain          1 32.5315  1.173e-08 ***
mean_rh:mean_temp       1 25.1495  5.305e-07 ***
mean_ws:avg_daily_rain  1 23.4420  1.287e-06 ***

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I'm happy to use the second model if ignoring year is not an issue, and increases the reliability of results. Interaction effects are the same in both models, but wind speed became significant too after running car::anova on the second model. The study was conducted in 11 years.

score 0 · Accepted Answer · answered Jan 25 '23 at 20:54

0

Can I say that the predictors mean_rh and mean_temp had a significant positive effect individual effect (effect on their own). The predictors mean_ws and avg_daily_rain had no significant effect on their own?

NO.

This is a common misinterpretation of the coefficients reported by linear modeling software when there are interactions. With continuous predictors, an individual coefficient represents the association with outcome when all its interacting predictors are at a value of 0. At least, you need to qualify your claims about "significance" accordingly.

More thoroughly, you need to evaluate each predictor including all of its interactions, plus evaluate the interactions. The default Type II analysis used by the Anova() function of the car package is a good choice for that. This vignette shows that car::Anova() works on this type of model. You should also look at the other methods in that vignette for documenting model adequacy. This answer illustrates some of that for a beta model with glmmTMB.

With interactions it's usually best to display model predictions at useful values of the covariates. I don't use the ggeffects package. Make sure that the ranges of all covariate values used for the displays make sense.

You have an additional issue to deal with: you are at risk of severely over-fitting your data with this model. As a rule of thumb, you need about 15 observations per coefficient (beyond the intercept) to avoid overfitting. It appears that you are trying to estimate 6 such coefficients plus a random-effect variance based on 37 observations. If I understand the software correctly, the coefficients are in a logit (log-odds) scale. Some of the coefficients look very large on that scale, which might be a sign of overfitting (although the magnitudes of coefficients depend on the scales of the predictors and can be misleading with interactions as you have here).

answered Jan 25 '23 at 20:54

EdM

92,183
10
92
267

I'm going through a similar question that you have answered here https://stats.stackexchange.com/questions/593652/repeated-measures-anova-with-significant-interaction-effect-but-non-significant. Here you suggested that " I happen to have this ancient text on hand. Section 16.2.2, page 180: "If the interaction effect is found to be significant, do not test the main effects even if they appear not to be significant. The estimation of the main effects and their significance is coding dependent when interactions are included in the model." – Ahsk Jan 25 '23 at 22:27
My interactive effects are significant, why do I need to additional tests? This answer doesn't apply to non-linear models? Just curious. Thanks again – Ahsk Jan 25 '23 at 22:28
@Ahsk "main effects" in the context of that other answer are the individual coefficients reported in the model summary. The car::Anova() test is something different, a "chunk" test on all coefficients involving the predictor. That' type of combined test is important for any type of model where a single predictor is involved in multiple terms (e.g., non-linear terms, interactions). With only 2-way interactions you will get the same p-values from car::Anova() and the model summary, as you saw, but that's not true if there are higher-order interactions. – EdM Jan 26 '23 at 01:07
Thank you! I'm marking the question as answered, but I have one confusion about car::Anova() output though. In the case of car::Anova(mod2), I can only talk about the significant interaction effect of mean_rh:mean_temp & mean_ws:avg_daily_rain NOT significant main effect of mean_ws & avg_daily_rain even there are askteric(*) opposite to them. The car::Anova() just tells me that if including that term helps to reduce the unexplained variance? The original model tells us the effect on the response variable, the anova tells us on the explained variance. Is that right? – Ahsk Jan 26 '23 at 02:42
@Ahsk there is no single main effect for a predictor involved in an interaction. Don't go looking for one. Its association with outcome depends on the values of its interacting predictors. The p-values for its coefficient reported in your "original model" depends on how the interacting predictors are coded. See this page. The car::Anova() result is the best estimate of whether such a predictor is associated overall with outcome. On that basis, mean_ws & avg_daily_rain are significantly associated with outcome in your last model. – EdM Jan 26 '23 at 13:18
@Ahsk only you and your colleagues can determine whether it's OK to ignore year, based on your understanding of the subject matter. If you ignore it, you are assuming that all 37 observations are independent of each other, with no correlations among them based on things like year or season. Is that a reasonable assumption? If not, the more conservative mod1 shows a substantial association by Anova of avg_daily_rain and of two interactions with outcome, which might be a safer interpretation of your results. Also, I'm still worried about whether you are over-fitting your data. – EdM Jan 26 '23 at 13:35
they are not completely independent because new plants were planted in the same field next year. For only 2-3 years, plants were planted in a new place. But I do understand your concern about over fitting. I'm going to fit the second model with GAMs using betar() family to check main effects. I hope the splines will somewhat account for the year effect. For interaction effect, I will only fit one interaction (e.g., mean_rh*mean_temp) at time using glmmTMB including year as a random effect instead of including two interactions as I have done in model. Thanks very much for your time. – Ahsk Jan 26 '23 at 14:07

Seeking feedback on the interpretation and reporting of a glmm output

1 Answers1

Linked