1

I am in need of using the cumulative density functions from either the zero inflated Poisson or zero inflated negative binomial. The methods ask that you supply:

pstr0:
Probability of a structural zero (i.e., ignoring the Poisson distribution), called phi. 
The default value of phi = 0 corresponds to the response having an ordinary Poisson distribution.
http://search.r-project.org/library/VGAM/html/zipoisUC.html

What sort of heuristic might I use to estimate this or perhaps solve for it?

I am thinking that if this is a way to simulate pstr0 then it must be the case that pstr0 is just the % of zeros in my data set.

For example:

# libraries
library(VGAM)
library(pscl)
# generate zero inflated poisson data, rate of 5, probability of structural zeros .2
arr1 <- rzipois(n=100, lambda=5, pstr0=0.2)
# fit a model with intercept
m1 <- zeroinfl(arr1 ~ 1)
# predict zeros
mean(predict(m1, type='zero'))  # 0.18%
prop.table(table(arr1))  # 0.19%
John Stud
  • 319
  • 1
  • 13
  • Do you have data available? If so, you could fit a zero-inflated Poisson or zero-inflated NB to your data (for example, via fitdistrplus), which will give you an estimate of the proportion you're looking for. – Izzy Dec 21 '20 at 19:58
  • No, but I have added some content to the main body to suggest a way it might work – John Stud Dec 21 '20 at 21:11
  • 1
    pstr0 shouldn't just be the % of zeros in your dataset, because some of the zeros in your dataset would be coming directly from the pre-inflation Poisson distribution. – fblundun Dec 21 '20 at 21:36
  • I'm a little confused about what you're trying to do. If you are simulating data from a zero-inflated Poisson, then you know what the probability of structural zeros is, since you are specifying it as whatever you want. As @fblundun said, the final percentage of zeros in your data will then come from both the Poisson (for example, a Poisson distribution with lambda=5 and no zero inflation would have a ~0.7% probability of producing a 0) and the zero inflation component. – Izzy Dec 21 '20 at 21:46
  • I am using cumulative density functions on my sampled data, here for the zero-inflated poisson, which by default puts pstr0 to 0. Given that I think I have sampled zeros AND structural zeros, I am trying to figure out how I might estimate the extent of my structural zeros, so as to get the correct cumulative density function from rzipois. The simulation was just a test and seemed consistent with what a data set might look like by generating some y data distributed zero inflated poisson. – John Stud Dec 21 '20 at 22:12

1 Answers1

0

Wikipedia says the MLE estimators for $\pi$ (your pstr0) and $\lambda$ are given by:

$${\displaystyle {\hat {\pi }}_{ml}=1-{\frac {m}{{\hat {\lambda }}_{ml}}}}$$

and

$${\displaystyle m(1-e^{-{\hat {\lambda }}_{ml}})={\hat {\lambda }}_{ml}\left(1-{\frac {n_{0}}{n}}\right)}$$

And the solution of that 2nd equation is:

$${\displaystyle {\hat {\lambda }}_{ml}=W_{0}(-se^{-s})+s}$$

Where $m$ is the sample mean, $\frac{n_0}{n}$ is the observed proportion of zeros, $s$ is ${\displaystyle {\frac {m}{1-{\frac {n_{0}}{n}}}}}$, and $W_0$ is Lambert's W-function.

The question Overfitting of Zero-inflated Poisson has an example of R code that uses the pscl package to estimate these parameters.

fblundun
  • 3,959
  • I am not sure that the reference code is doing anything different. Notice that pi_zpo = exp(coef(zpo)[2])/(1+exp(coef(zpo)[2])) returns exactly the proportion of zeros that are in the data set. So whatever you set for pstr0 here: arr1 <- rzipois(n=100, lambda=5, pstr0=0.00) is just the distribution of zeros vs. everything else. Thus, it should be the case that % structural zeros is just the fraction of zeros vs non zeros. – John Stud Dec 22 '20 at 00:06