0

For count data with excessive number of zeros, there are two choices of models, zero inflated poisson and zero inflated negative binomial.

Q1: How does one make appropriate choice between the two from theoretical viewpoint?

To me, zero inflated negative binomial has an extra parameter for variance part which allows more flexible variance structure as compared to zero inflated poisson.

Q2: How do I know this behavior beforehand?

Q3: Suppose one is compelled to choose between ZIP and ZINB. Is a badly fitted model(ZIP) with significant parameter better than a good fit model(ZINB) with insignificant parameter? The improvement of fitting is dramatic. I am leaning towards ZINB as it explains my data quite well but insignificance is obtained in return.

user45765
  • 1,416
  • If the zero-inflation parameter is not significant, why use a zero-inflated model? – jbowman Jul 12 '22 at 15:54
  • @jbowman Should you use ZIP in this case to declare victory? It sounds like that you are saying even if data fits model well without significance, I should throw away that model. I think there is good chance that there is no association between outcome of interest and covariates included in ZINB. – user45765 Jul 12 '22 at 16:02
  • "The data fits model well without significance" sounds like overfitting to me. The implication is (usually) that a simpler model will also fit the data well. In this case, the simpler model would have no zero inflation term. – jbowman Jul 12 '22 at 18:46
  • @jbowman When I fit the model, I have selected the most simple model. Unfortunately, doing cross validation will drop sample size for inference model building. I also checked either regression without zero inflation and fitting is not well. Reducing the model further will eliminate covariate of interest to test association. Since I am doing inference to test association, I would not be interested in predictive modelling which you are suggesting here. – user45765 Jul 12 '22 at 19:44
  • What exactly do you mean by "insignificant parameter"? A fit shouldn't go from good to bad by removing an insignificant parameter, that's kind of the point of significance. – jbowman Jul 12 '22 at 19:58
  • @jbowman Maybe this is confusing. Say I have a covariate $X$ of interest to testing association with count outcome. I fit poisson regression with $X$ included and it yields significance. Due to dispersion, I fit with zero inflated poisson and it yields very negative log likelihood with $X$ being significant in ZIP. After fitting by zero inflated negative binomial, $X$ becomes insignificant. However, in ZINB, log likelihood is reduced by 3 fold in size. Then I select the model with relevant confounders including $X$. ZIP/poisson shows $X$ significant whereas ZINB does not. – user45765 Jul 12 '22 at 21:01
  • You're saying the LL is 3x for a zero-inflated Poisson vs. the best fit zero-inflated NB? – jbowman Jul 12 '22 at 21:41
  • @jbowman That is LL I got from the optimization and I did not see convergence issue. The rootogram ZIP has wave behavior and randomized quantile residuals deviates from straight line. A lot of improvement seems to come from extra dispersion parameter which is shocking to me. However, diagnostic of randomized quantile residuals and rootogram check out ZINB more than ZIP. This is part of reason for Q1 and Q2. – user45765 Jul 12 '22 at 21:49

0 Answers0