Modelling a Poisson distribution with overdispersion

Question

I have a data set that I'd expect to follow a Poisson distribution, but it is overdispersed by about 3-fold. At the present, I'm modelling this overdispersion using something like the following code in R.

## assuming a median value of 1500
med = 1500
rawdist = rpois(1000000,med)
oDdist = rawDist + ((rawDist-med)*3)

Visually, this seems to fit my empirical data very well. If I'm happy with the fit, is there any reason that I should be doing something more complex, like using a negative binomial distribution, as described here? (If so, any pointers or links on doing so would be much appreciated).

Oh, and I'm aware that this creates a slightly jagged distribution (due to the multiplication by three), but that shouldn't matter for my application.

Update: For the sake of anyone else who searches and finds this question, here's a simple R function to model an overdispersed poisson using a negative binomial distribution. Set d to the desired mean/variance ratio:

rpois.od<-function (n, lambda,d=1) {
  if (d==1)
    rpois(n, lambda)
  else
     rnbinom(n, size=(lambda/(d-1)), mu=lambda)
}

(via the R mailing list: https://stat.ethz.ch/pipermail/r-help/2002-June/022425.html)

score 11 · Accepted Answer · answered Jul 19 '10 at 19:51

11

for overdispersed poisson, use the negative binomial, which allows you to parameterize the variance as a function of the mean precisely. rnbinom(), etc. in R.

answered Jul 19 '10 at 19:51

Cyrus S

766

1

Why negative binomial and not a mixed model with an observation-level random effect? This is not a rhetorical question. This is an "I do not understand which one I should prefer." question.
In addition, what if I have a repeated measures situation? When my data is continuous, I will use a generalized linear mixed model. The Gamma distribution often works well with continuous biological data, and the mixed model handles the repeated measures element. But what does one do if one has overdispersed repeated measure count data?
– Bryan Mar 24 '16 at 14:35
One reason why the reparameterized negative binomial model is popular with over-dispersed poisson data is b/c it models the variance as a function of the mean (same as in the poisson) with an over-dispersion parameter to model "extra" variance. See page 487 here for a quick formula : https://www.worldscientific.com/doi/pdf/10.1142/9789813235533_0044 and the wikipedia page for an explanation on the reparameterization: https://en.wikipedia.org/wiki/Negative_binomial_distribution – Samir Rachid Zaim Aug 29 '19 at 18:50

score 4 · Answer 2 · answered Jul 19 '10 at 19:32

4

If your mean value for the Poisson is 1500, then you're very close to a normal distribution; you might try using that as an approximation and then modelling the mean and variance separately.

answered Jul 19 '10 at 19:32

Rich

4,566

That's just an example - it might have a median that is much smaller, on the order of 200 (it depends on how I partition the data). That would preclude using a normal distribution, right? – chrisamiller Jul 19 '10 at 19:37
1

The normal approximation to the Poisson distribution is pretty robust, the difference between the CDFs is bounded by something like 0.75/sqrt(lambda), if I recall correctly. I wouldn't be too worried about using lambda=200, but if you're more risk-averse then definitely go with the negative binomial. – Rich Jul 19 '10 at 20:46

Modelling a Poisson distribution with overdispersion

2 Answers2

Linked