0

I'm modelling the number of schools in a given region, as a function of the population of that region. I've got something like 6000 different regions with populations ranging from something like 100 up to around 100,000 and the number of schools in a region ranging from 0 up to something like 25.

What would the best distribution be for modelling something like this? I'm interested in calculating the probability that a region of population size N would have a school, and then comparing how those probabilities vary across geographies.

I've tried thinking about this as a Poisson distribution but found the variance to be much greater than the mean. I've tried the negative Binomial but I'm not so sure about the independence of 'trials' (where I'm saying each trial is a person, so a region with population 500 and 0 schools has had 500 trials with 0 successes). Specifically, I'm pretty sure that as the population goes up the probability of a school increases, but as soon as a school is built then the probability of an additional school decreases again, up until a certain point (etc. etc.).

I feel like I'm missing something really obvious here and that this must be a solved problem, but the best I've got at the moment is creating groupings of population (a bucket of 100 - 200) such that I can reasonably calculate the mean number of schools within a bucket, and then running logistic regression style models on the groups.

Does anybody have any ideas how I'd be best thinking about this kind of problem?

Kali_89
  • 171
  • 1
  • 4
  • The probability that a region of population size $N$ would have a school, or how many schools it would have? – jbowman Jan 12 '23 at 22:35

1 Answers1

0

Maybe a random effects (random intercepts for region) count data regression model, with log(population of region) as an offset? A count data regressio0n could be poisson or negative binomial.

For background on use of offset for modelling rates, see Goodness of fit and which model to choose linear regression or Poisson

For mixed effects poisson regression, see May "offsets" be used in mixed-effects poisson regression?, How to analyze longitudinal count data: accounting for temporal autocorrelation in GLMM?