Choosing a distribution to deal with over-dispersion

Question

I'm working on a life insurance problem: trying to simulate the total dollar amount of claims in a year. To do this, I have a record for each person that contains their amount of insurance and an estimate of the probability that they will make a claim during the next year. I have run thousands of one-year simulations and developed the range of outcomes.

Unfortunately, the real historical experience is far more volatile than my analysis suggests. Almost every year looks like a 1-in-100 year. I used the binomial distribution (thinking simplistically that either a person does or doesn't make a claim), but I have been advised that my data suffers from over-dispersion. That sounds possible, because obviously we do not really know each person's precise probability of making a claim. Our estimate is wrong for any particular individual, but reasonably accurate for each sub-group. The recommended solution was to use a negative binomial distribution so that the variance can be calibrated separately from the mean.

Certainly this property would be helpful, but literature on the negative binomial distribution focuses on applications of counting successes and failures. That doesn't seem relevant to what I'm trying to do. Is this really a good probability distribution to use in this context, and why?

You should really add some more details --- what type of claims (car insurance, whatever?) claims can be modelled as a continuous stochastic variable, so distributions with positive values, such as gamma or lognormal could be tried. — kjetil b halvorsen, Feb 07 '14 at 20:23
Thanks. These are life insurance claims, so the amount of each claim is fixed. The only variable is the number of claims. — , Feb 07 '14 at 20:38

Glen_b · Answer 1 · 2014-02-10T21:28:26.610

If you want to simulate an individual's probability of a claim (and those are your own words so I assume you do), that's a continuous random variable, not a discrete one.

The discrete one would relate to a subsequent thing - whether or not an individual did claim.

With homogeneity of probability across individuals, a model with fixed $p$ for everyone would obviously underestimate the variation in probabilities of making a claim, since it assumes it to be 0.

The probability of a claim might be modelled by say a beta distribution, and the event "the insured person made a claim" would, conditionally on that probability, then be distributed as Bernoulli; unconditionally, the number of claims across many policies would be beta-binomial.

This has a fair bit of flexibility to model data where the counts are either 1 or 0, and the model for $p$ is readily interpretable (as a model for the homogeneity in probability of claims).

score 1 · Answer 2 · answered Feb 07 '14 at 21:35

After your answer in the comments: There is no reason you cannot use negative binomial models! If you try googling for " overdispersed models for number of insurance claims" there is a lot of hits which seems relevant.

Note that negative binomial model can be seen as a poisson model where the poisson mean parameter itself has a distribution. Using such ideas you can construct other overdispersion models also.

Choosing a distribution to deal with over-dispersion

2 Answers2