2

I assume that most of the sources about incidence rates are assuming that there is only one event per person. In a medical context (e.g. cancer) this make sense.

I am able to compute the incidence rate and its CI if there are fewer or the same number of events as people. See the example code below. But when there are more events then people in population there are errors. I don't know why.

My hypothesis so far

  1. I did technical something wrong.
  2. Statistically my method is wrong. To compute the CI of an incidence rate if multiple events are possible you need another approach.
  3. Statistically a CI makes no sense when multiple events per person are possible.

I can compute the incidence rate and it's confidence interval with that R function

incidence_rate_with_CI <- function(observed_n, population_n, rate_relation, CI = 95) {
    # incidence rate
    ir = observed_n / population_n * rate_relation
    # Variance of raw incidence rate
    ir_variance = ir * (rate_relation - ir) / population_n
nd = qnorm((1 + CI / 100) / 2, mean=0, sd=1)

# Confidence interval
ci_lower = ir - nd * sqrt(ir_variance)
ci_upper = ir + nd * sqrt(ir_variance)

return( list(ir=ir, lower=ci_lower, upper=ci_upper) )

}

An example calculation with half as many events as people in population.

> population = 1000000
> events = population / 2
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 500

$lower [1] 499.02

$upper [1] 500.98

Mut let's make more events then people throws errors.

> population = 1000000
> events = population + 1
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 1000.001

$lower [1] NaN

$upper [1] NaN

Warning messages: 1: In sqrt(ir_variance) : NaNs produced 2: In sqrt(ir_variance) : NaNs produced

Sidenote: There is not tag for incidence rate but incidence rate ratio. Is the latter a synonym for the first or is it something different?

buhtz
  • 232

1 Answers1

2

Looking at multiple events per subject is very common. E.g. methods for the negative binomial distribution (or negative binomial regression) are very popular and often a better than assuming a (simpler) Poisson distribution (because rates can differ between subjects). With a log-time-at-risk offset negative binomial regression (and similarly Poisson regression) can take into account that different subjects might be at risk for different periods of time.

In that setting, you typically work with log-rates, because confidence intervals with normal approximations are better behaved on the log-scale (plus it reflects that rates cannot be negative). E.g. the MASS::glm.nb function from, but be aware that for small to moderate sample sizes it gives a too small SE (for a discussion of that and how to fix it, see here). E.g. for a single negative binomial rate, you could (ignoring the issues with SEs) get:

library(MASS)

negbinfit1 = glm.nb(data = data.frame(subject=c(1,2,3,4,5), events=c(100,50,150,10,75), log_followup=log(c(2,1,2,0.5,1))), formula = events ~ 1 + offset(log_followup))

summary(negbinfit1)

exp(summary(negbinfit1)$coef[1,1]) exp( summary(negbinfit1)$coef[1,1] - qnorm(0.975)summary(negbinfit1)$coef[1,2] ) exp( summary(negbinfit1)$coef[1,1] + qnorm(0.975)summary(negbinfit1)$coef[1,2] )

Björn
  • 32,022
  • I can't rate your statistical explanation because of lake of statistical knowledge on my side. But your code doesn't help because it doesn't reflect my MWE, it shows no output and there is no explanation (e.g. in code comments) in it. It would be great if you could improve your answer. And it would help if you don't use any library's (which are blackboxes) but raw R code. – buhtz May 04 '22 at 14:38
  • 2
    You will struggle to get anywhere without some packages for most things, but in a simple Poisson case, you get a simple solution: after observing total of y events in total t time units of follow-up, you can get a CI for the log-rate via log(y/t) +- qnorm(1-alpha/2) * sqrt(1/y). It's equivalent to exp(confint(glm(events ~ 1 + offset(log_followup), family=poisson))). Or conjugate updating with a gamma distribution to get Bayesian credible intervals: qgamma(p=c(0.025, 0.5, 0.975), 1/3+sum(c(100,50,150,10,75)), 0+sum(c(2,1,2,0.5,1))) Not so easy once you allow variation in rates between subjects. – Björn May 04 '22 at 14:55