4

Following what is suggested here https://stackoverflow.com/questions/7157158/fitting-a-zero-inflated-poisson-distribution-in-r

> stat
    x    N
 1: 0  478
 2: 1  901
 3: 2 1101
 4: 3  873
 5: 4  583
 6: 5  250
 7: 6   97
 8: 7   31
 9: 8   10
10: 9    2

# vect <- rep(stat$x, stat$N)
count <- c(478, 901, 1101, 873, 583, 250, 97, 31, 10, 2)
vect <- rep(0:9, count)
library(fitdistrplus)
library(gamlss)

fit <- fitdist(vect, "ZIP", start=list(mu=2.4, sigma=0.1))
# mu = 2.64, sigma = -0.14, log = TRUE): sigma must be between 0 and 1 

The plots are from regular poisson fit. As I see there are more zeroes, and gof is 0.00087, so I hope ZIP could help.

enter image description here

However, if I use zeroinfl from pscl

summary(zeroinfl(x ~ 1, dist="poisson", data=data.frame(x=vect))

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.4945 -0.8607 -0.2269  0.4069  4.2096 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.88120    0.01134   77.73   <2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.7452     0.2597  -14.42   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 10 
Log-likelihood: -7853 on 2 Df

mu = exp(0.8812) = 2.41
zero = logit(-3.7452, inverse=T)=0.02308537
colinfang
  • 431
  • 4
    As a guess, it looks like zeros are not inflated. Try a regular Poisson. – Peter Flom Jan 02 '14 at 13:25
  • 2
    @PeterFlom I added plots of regular poisson – colinfang Jan 02 '14 at 13:42
  • 3
    It fits a Poisson almost perfectly. – Peter Flom Jan 02 '14 at 14:10
  • 1
    There does not appear to be any considerable zero inflation to me. The regular Poisson seems to fit quite well. Just as a note, a gof test with a sample size as large as yours will almost always reject, since you have so much power in the test you will detect even very minute differences in the distributions being compared. – Underminer Jan 02 '14 at 14:15
  • @Underminer really? I thought only normality test is not useful for large dataset – colinfang Jan 02 '14 at 14:30
  • 2
    They represent the same type of test. The p-value still represents the same thing. Minor deviations from the theoretical distribution (whether it be Normal or Poisson) will still be detected. And to extend this further, increasing the sample size will always increase the probability to reject the null hypothesis if the null hypothesis is not true (even if you are very close to the null). – Underminer Jan 02 '14 at 14:49
  • 2
    Your plots show that there are more zeros in the data than in the theoretical distribution. Does seem surprising that fitdist/ZIP allow the zero-inflation parameter to go negative, but the results seem sensible. (In contrast, pscl::zeroinfl fits the zero-inflation probability on the logit scale, so it can't go negative. – Ben Bolker Jan 02 '14 at 15:16

1 Answers1

0

This works:

fit <- fitdist(vect, "ZIP", start=list(mu=2.4, sigma=0.1),
      lower=c(-Inf, 0.001), upper=c(Inf, 1), optim.method="L-BFGS-B")

which gives a likelihood of -7853.122

So @Ben Bolker is correct.

It doesn't work even if I specify lower to be 0, as it would try to evaluate at sigma = 0, which is not supported for ZIP.

colinfang
  • 431