I assume that most of the sources about incidence rates are assuming that there is only one event per person. In a medical context (e.g. cancer) this make sense.
I am able to compute the incidence rate and its CI if there are fewer or the same number of events as people. See the example code below. But when there are more events then people in population there are errors. I don't know why.
My hypothesis so far
- I did technical something wrong.
- Statistically my method is wrong. To compute the CI of an incidence rate if multiple events are possible you need another approach.
- Statistically a CI makes no sense when multiple events per person are possible.
I can compute the incidence rate and it's confidence interval with that R function
incidence_rate_with_CI <- function(observed_n, population_n, rate_relation, CI = 95) {
# incidence rate
ir = observed_n / population_n * rate_relation
# Variance of raw incidence rate
ir_variance = ir * (rate_relation - ir) / population_n
nd = qnorm((1 + CI / 100) / 2, mean=0, sd=1)
# Confidence interval
ci_lower = ir - nd * sqrt(ir_variance)
ci_upper = ir + nd * sqrt(ir_variance)
return( list(ir=ir, lower=ci_lower, upper=ci_upper) )
}
An example calculation with half as many events as people in population.
> population = 1000000
> events = population / 2
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 500
$lower
[1] 499.02
$upper
[1] 500.98
Mut let's make more events then people throws errors.
> population = 1000000
> events = population + 1
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 1000.001
$lower
[1] NaN
$upper
[1] NaN
Warning messages:
1: In sqrt(ir_variance) : NaNs produced
2: In sqrt(ir_variance) : NaNs produced
Sidenote: There is not tag for incidence rate but incidence rate ratio. Is the latter a synonym for the first or is it something different?