8

I have seen both Tweedie GLMs and Zero-Inflated (ZI) GLMs used in the field of ecology. Tweedie seems to have the benefit of not treating excess zeros separately, as is done using ZI regressions. However, ZI methods seem to be preferred in the literature.

Why would one use Tweedie GLMs instead of ZI GLMs, and vice versa? Are there diagnostic methods post modeling that would cause one to pick one or the other?

1 Answers1

5

Tweedie GLMs are true GLMs and enjoy the usual properties of GLMs. ZI GLMs are more complex models that assume a GLM plus an extra zero-inflation process, so they are obviously more flexible but at the cost of extra parameters. If the simpler Tweedie GLM fits the data adequately then it is the preferable model. It is doesn't, then you probably need the added complexity of the ZI GLM.

Tweedie GLMs assume that the probability of an exact zero is related to the expected value of the process. If the expected value is low, then zeros will be more common. If the expected value is high, then zeros will be less common. This relationship can be tuned somewhat by choosing the index of the Tweedie model but, if the relationship doesn't hold as expected, then a more complex ZI model may be required.

You can judge the fit of the Tweedie GLM using randomized quantile residuals. See for example Interpreting GLM residual plot or Poisson regression residuals diagnostic. You might for example give special attention to the residuals arising from exact zeros in the datasets.

Also see Can a model for non-negative data with clumping at zeros (Tweedie GLM, zero-inflated GLM, etc.) predict exact zeros? regarding how to estimate the probability of zeros predicted by a Tweedie GLM.

See also A model for non-negative data with many zeros: pros and cons of Tweedie GLM for a related question and answer.

References

Gordon Smyth
  • 12,807
  • Shouldn’t you choose the model based on the theoretical distribution in question? – geoscience123 Jul 15 '22 at 23:32
  • 1
    @geoscience123 Randomized quantile residuals compare the theoretical distribution to the data on an observation by observation basis, which is the only way it can be done. Examining the marginal distribution of the data is irrelevant, if that is perhaps what you are thinking. – Gordon Smyth Jul 15 '22 at 23:42
  • I was more thinking that the ZI distributions theoretically treat the data as two separate processes, rather than one. Does Tweedie work the same way? – geoscience123 Jul 15 '22 at 23:43
  • 1
    @geoscience123 Zeros arise in a natural way in Tweedie models as described by my 1996 paper (Smyth 1996). If however you are starting from the assumption that the zeros arise from a entirely separate process independently of the non-zero observations then you are assuming a priori that a ZI model is appropriate and your whole question becomes circular. I would gently suggest that scientific decisions should be data-driven and you should ask whether there is any evidence for a separate ZI process. – Gordon Smyth Jul 16 '22 at 00:04
  • That makes sense. If I’m summing this up properly, assuming two separate processes, ZI fits. Otherwise, check data to see which distribution fits better, starting with Tweedie? – geoscience123 Jul 16 '22 at 00:16
  • @geoscience123 Yes. But making assumptions and fitting data are different things. If your theoretical model has two separate processes, then by definition you are assuming a ZI model. That's what a ZI GLM is. Whether your theoretical model fits the data at hand is a separate question. There are many ZI models and the specific one you are assuming could still be wrong. – Gordon Smyth Jul 16 '22 at 00:41
  • Understood. Thanks for taking the time! – geoscience123 Jul 16 '22 at 00:45
  • @geoscience123 Looking at this later, I thought it might be helpful to add one more point. To judge whether a more complex model is necessary, we fit the simpler model (in this case the Tweedie glm without ZI) and then look for lack of fit. Lack of fit is always assessed from the simpler model (the null hypothesis if you like) rather than from the more complex model. – Gordon Smyth Dec 12 '23 at 22:05