Is it appropriate to employ a zero-inflated Poisson regression model for datasets characterized by a notable presence of zeros, even when these zeros are true zeros?
-
2Why wouldn't it be okay? – Shawn Hemelstrand Jan 28 '24 at 09:10
3 Answers
A zero-inflated Poisson model (or any other zero-inflated model) is a special case of a mixture model, i.e., one where we model observations as coming from a mixture of two or more underlying and unobservable distributions. In the specific case of zero inflation, we use a point mass at zero, mixed with some other distribution - here the Poisson.
This is typically appropriate when we suspect multiple different data generating processes at work, one of which generates observations according to one mixture components, and the others(s) according to the other(s).
For instance, we might have a certain probability of a single potential customer entering our store or not. If they don't enter at all, we will have zero sales. If they do enter, they might buy something or not, and total sales could be modeled using a Poisson distribution. (Of course, this very much depends on the fact that we have either zero or one customer, but no more.)
I would not say that "many" zeros automatically motivate a ZIP model. I would rather say that having "too many" zeros shows that a simple Poisson model might not be appropriate, and might motivate us to think about the data generating process. We may next want to try a ZIP model. Or a negative binomial one, which also has "more" zeros than a Poisson.
- 123,354
-
2An alternative is a semiparametric model that doesn’t have a distributional assumption. E.g. https://fharrell.com/post/rpo – Frank Harrell Jan 28 '24 at 14:35
Poisson regression sucks, but zero-inflating your models is fine
As previously discussed here, here and here the Poisson regression is almost always a bad count model and is far inferior to the negative binomial count model and other models that use two parameters to fit the mean and variance of the data well. The main drawback of Poisson regression is that it has a variance that is fixed by the mean and so it does not fit the variance of the empirical data properly. This drawback is removed by using a more appropriate count model such as the negative binomial model or the quasi-Poisson model.
The addition of zero-inflation will solve problems occurring from having an additional point-mass at zero, but it will still not make the Poisson regression a good count model in general. The addition of "true zeroes" does indeed mean that zero-inflation of your model may be appropriate, but you should zero-inflate a good model, not a bad model. A zero-inflated negative binomial count model will have the ability to properly fit the mean and variance of your count data and also capture excess zeroes in the data, allowing for mixtures of negative-binomial counts and "true zeroes". I would recommend you start with this kind of model and deveop from there.
- 124,856
It is certainly ok to use it to capture the excess zeros, even when it doesn't have the interpretation of true vs false zeros. However, a zero-hurdle model may be easier to interpret in such situations.
- 15,515
- 2
- 38
- 62