I wish to estimate the size parameter of the negative binomial distribution in R. Ideally I would like to estimate the size parameter when all I have are n = numbers of samples, m = mean count per sample, and n*m = total counts. I started out with the binomial distribution and estimated the binomial probability several ways shown below, thinking I could use that R code as a template for the negative binomial distribution.
Here is R code for the binomial distribution:
set.seed(1234)
N <- 100
Size <- 1
Prob <- 0.75
my.data <- rbinom(n = N, size = Size, prob = Prob)
sum(my.data) / length(my.data)
#[1] 0.81
use dbinom
getParm1 = function(betas, my.data) {
b1 = betas[1]
ll = dbinom(my.data, 1, b1)
return(-sum(log(ll)))
}
r1 = optim(c(0.50), fn = getParm1, my.data=my.data, method = "Brent", lower = 0, upper = 1)
r1$par
#[1] 0.81
use the explicit likelihood
getParm2 = function(betas, N, K) {
p1 = betas[1]
nCk = gamma(N+1) / (gamma(K+1) * gamma((N-K)+1))
my.likely <- nCk * p1^K * (1-p1)^(N-K)
return(-log(my.likely))
}
r2 = optim(c(0.50), fn = getParm2, N = length(my.data), K = sum(my.data), method = "Brent", lower = 0, upper = 1)
r2$par
#[1] 0.81
remove the binomial coefficient
getParm3 = function(betas, N, K) {
p1 = betas[1]
my.likely <- p1^K * (1-p1)^(N-K)
return(-log(my.likely))
}
r3 = optim(c(0.50), fn = getParm3, N = length(my.data), K = sum(my.data), method = "Brent", lower = 0, upper = 1)
r3$par
#[1] 0.81
use the sum of outcomes with dbinom
getParm4 = function(betas, my.data) {
b1 = betas[1]
ll = dbinom(sum(my.data), length(my.data), b1)
return(-log(ll))
}
r4 = optim(c(0.50), fn = getParm4, my.data=my.data, method = "Brent", lower = 0, upper = 1)
r4$par
#[1] 0.81
However, with the negative binomial distribution I am getting stuck even when trying to use counts from every sample (my.data), let alone trying to use just total counts (total.count) without my.data. Here the estimate of size = 3.073976 is not close to the true value of 25.
set.seed(1234)
N <- 100 # number of samples
theta <- 25 # size or shape or dispersion parameter
m <- 0.75 # mean counts per sample
my.data <- rnbinom(n = N, mu = m, size = theta)
total.count <- sum(my.data)
use dnbinom
getParm1 = function(betas, my.data, m) {
b1 = betas[1]
ll = dnbinom(quantile(my.data), mu = m, size = b1)
return(-sum(log(ll)))
}
r1 = optim(c(50), fn = getParm1, my.data=my.data, m = m, method = "Brent", lower = 0, upper = 100)
r1$par
#[1] 3.073976
Is it possible to estimate the size parameter of the negative binomial distribution using approaches similar to what I used above with the binomial distribution? Is it possible to estimate the size parameter without having the counts per individual sample in my.data?
Here is a similar post, maybe even asking the same question, but without much R code:
P.S.
Originally I asked this question on StackOverflow and Ben Bolker provided the following two comments:
"This belongs on CrossValidated. I think the simple answer is "no". (1) Number of samples is information about the experimental design, not about the underlying distribution. (2) n, m, and n*m represent only two independent pieces of information, not three (if you know any two you can compute the third). (3) Therefore you really only have one piece of information about the distribution (the mean). (4) Unlike the binomial (which only has a single undetermined parameter, p), the NB has two ({prob, size} or {mu, size}). (5) You're out of luck ... "
– Ben Bolker
"If you had another piece of information (like the variance or standard deviation of the counts) you could do it. But not with just n and m."
– Ben Bolker
I need to explore using the variance or standard deviation of the counts.