1

I have modeled a count variable for a study that occurs over 3.5 years. The regression is a mixed effects ZINB with a random intercept for county and an offset for population. It was suggested that I add a dummy variable for the year but when I do, then it creates a problem with the quantile deviations. Here is the plot (ZINB without year as a predictor variable) ZINB without year as factor variable. I was a bit concerned about the lowest quantile but this is the best I've been able to get. When I added a year, this happened. This is the ZINB with year as factor variable. ZINB with year as factor variable. I did also try year as both numeric and factor variable. It would not converge with year as a random intercept, presumably due to the relatively short duration of the study (I read it should be >5 years to use as a random intercept). Based on these residuals, plotted with DHARMa, I have a preference for the first one but wonder how much to weigh these plots in my decision.

  • Could you add the code of your regression models with the plots? – Florian Hartig May 05 '23 at 12:39
  • @FlorianHartig, thanks so much for your comment. I have moved on to working with an ecologist who can help me conduct the more complex analysis that is required of the data. There are non-linear relationships. Should I take the post down now? I suppose it won't help anyone else. Problems with the residuals revealed with DHARMa plots helped me recognize this. Thank you! – Claire Richards May 05 '23 at 14:09
  • Hi Claire, well, not sure, or just leave it open and provide the answer yourself? – Florian Hartig May 07 '23 at 12:07

1 Answers1

0

There are a couple of issues with this analysis that I changed since posting this question. If we divide the response variable by the denominator instead of using the offset, we have other distributions available to us (Tweedie, gamma) that allow for other analyses made more difficult with a negative binomial distribution. The biggest issue, though, is that there is temporal autocorrelation and (potential) nonlinear covariate effects. In the analysis we're currently finalizing (GAMM), we are accounting for temporal autocorrelation and so it would not make sense to include a random effect for the year. But also 4 years in a random factor would have been too few. So it will be included only as a fixed effect. I'm consulting with Dr. Zuur (highstat.com) on this analysis, so want to credit him for that.

All this is to say, the super weird residuals flagged problems in our model. We did not ignore those residuals and identified critical factors to include in our model.