6

The data I am dealing with are groups of counts, $n_i, i=1..K$. More than a half of these counts are zeros. The null hypothesis is that all the counts come from the same distribution, e.g., Poisson with parameter $\lambda$ $$P(n|\lambda)=\frac{\lambda^n}{n!}e^{-\lambda},$$ in which case I can perform the estimate of the parameter as the mean over all the counts, $$\hat{\lambda}=\frac{1}{K}\sum_{i=1}^Kn_i. $$

There might be however situations where the zero and non-zero counts are generated by different distributions (possibly more than two). I need a statistical test to identify such cases.

Clarification: the problem is not to test whether the distribution is Poissonian, but whether all counts come from the distribution or not. Zeros may be due to $\lambda$ being small... or because they are permanently zero. (I realized after the discussion in the comments, that the initial formulation of my questions is ambiguous.)

Roger V.
  • 3,903
  • You appear to change the question at the very end. If the zero and non-zero counts are "generated by different distributions," then exactly what is you model for them? It's obviously not Poisson! – whuber Feb 11 '23 at 16:45
  • @whuber if they are generated by different processes, then my null hypothesis is incorrect. For simplicity, one can consider that non-zero counts are still generated from a Poisson distribution, but zeros are just zeros - nothing happens. – Roger V. Feb 11 '23 at 18:25
  • 1
    Couldn't you do a test for overdispersion in a Poisson model (as described here, for example)? Maybe I'm misunderstanding your question. – COOLSerdash Feb 11 '23 at 18:46
  • It sounds like most Poisson tests would apply, but if you could be more specific about your alternate hypothesis one might be able to develop a more powerful test appropriate for it. – whuber Feb 11 '23 at 22:21
  • 1
    @whuber the problem is not to test whether the distribution is Poissonian, but whether all counts come from the distribution or not. Zeros may be due to $\lambda$ being small... or because they are permanently zero. – Roger V. Feb 12 '23 at 09:37
  • @COOLSerdash thank you for pointing this! Indeed, the approach that I have adopted for now is to test the variance of the process vs. its mean. – Roger V. Feb 12 '23 at 09:39
  • In the abstract setting you describe, I cannot determine what you might mean by "all counts come from the distribution" or by "permanently zero." These phrases might make sense in a particular application, so consider disclosing that in your post. – whuber Feb 12 '23 at 15:05
  • @whuber In my experience giving too many technical details in this SE guarantees that the question remains unanswered... One way to describe the situation would be as a zero-inflated distribution - but this suggests specific kind of answers. – Roger V. Feb 12 '23 at 15:18
  • In my experience, almost all questions that are stated abstractly fail to capture the unique or important aspects of the application, risking answers that are useless or misleading. – whuber Feb 12 '23 at 15:22
  • 1
    @whuber I am ready to modify the question or provide additional information, if someone is interested in answering it. E.g., I have tried to give you multiple clarifications in the comments above... but I am even not sure, whether you are trying to answer or simply going through the motions of a moderator managing the community. Please do jot take it as an offense, but we are all busy people, and so since posting the question I have learned more by googling than from the community – Roger V. Feb 12 '23 at 16:17
  • As you suppose, I am only trying to help you articulate an understandable, answerable question. I am not offended when that doesn't happen, but I do appreciate knowing when it would not be worth the time to continue. – whuber Feb 12 '23 at 16:38

1 Answers1

4

You can test the null hypothesis that the data follows a Poisson distribution against the alternative of a zero-inflated Poisson using for example the glmmTMB R-package. The test relies on the likelihood ratio test statistic being asymptotically chi-square with one degree of freedom.

However, unless you also allow zero-deflation under your alternative hypothesis, the null hypothesis of no zero-inflation is on the boundary of the parameter space, and the likelihood ratio statistic is then asymptotically distributed as an equal-weights mixture of chi-squares with zero and one degree of freedom (Stram and Lee 1994) so a better approximate $p$-value would be half of what is computed below.

# Simulated data from a zero-inflated poisson
set.seed(1)
y <- rpois(100, lambda = 3)
y[1:10] <- 0
data <- data.frame(y)

Testing Poisson against zero-inflated Poisson relying on asymptotic distribution of likelihood ratio statistic

library(glmmTMB) mod0 <- glmmTMB(y ~ 1, family=poisson, data) mod1 <- update(mod0, ziformula = ~ 1) anova(mod0, mod1) #> Data: data #> Models: #> mod0: y ~ 1, zi=~0, disp=~1 #> mod1: y ~ 1, zi=~1, disp=~1 #> Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
#> mod0 1 383.93 386.53 -190.96 381.93
#> mod1 2 380.88 386.09 -188.44 376.88 5.0513 1 0.02461 * #> --- #> Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Jarle Tufto
  • 10,939