3

I'm trying to write code (in C) to sample from a negative binomial distribution parameterized by $r,p$, where $r$ is not necessarily integer (also called Polya distribution).

I've found a number of explanations, and it looks like I could sample $y$ from a Gamma(r,1) distribution, transform $y$ somehow, and then use it to sample from a Poisson distribution, but that means I need to find a Gamma distribution with non-integer $r$.

All of this is a bit scary, and I was wondering if anyone could point me to some source code to do it. I don't much care what language. I have access to gamma functions, which I'm pretty sure will be needed.

  • This is a general comment (and perhaps not helpful), but did you consider sampling from uniform(0,1) and using an inverse probability integral transform (like here)? – Richard Hardy Apr 11 '15 at 06:19
  • @RichardHardy: I was hoping for something faster. To use the inverse-transformation, you need to construct the CDF for the specific $r,p$ arguments, and then use some sort of simple search function to invert it. Nevermind, I managed to find info for the Gamma-Poisson method and get it working. Perhaps I will answer my own question. The tricky part is sampling the Gamma for non-integer shape. In Wikipedia I found something called Ahrens-Dieter for that. – Mike Dunlavey Apr 11 '15 at 13:12

1 Answers1

4

Finally answering my own question...

I found the R source by googling "R source code". It is here.

The code for sampling a negative binomial is in rnbinom.c. It does it in two steps. First it samples from the gamma distribution in rgamma.c. That number is used as the mean of a poisson, defined in rpois.c.

I had coded this myself earlier (using Ahrens-Dieter for the gamma distribution) and got something reasonable, but if I generated 1e8 samples, the sample mean would consistently come out around 1/10 of one percent low, and the sample variance around one percent low.

When I use the source code from R it is more accurate. If I generate 1e8 samples, the sample mean and variance cluster around their true values. Whether that is due to a more accurate Gamma distribution, or a more accurate Poisson distribution, I don't know.