0

Say we have a number of different geographic areas, $A_1, A_2, ...$, and a quantity $X$ that follows a gamma distribution in each area but, we expect, with different parameters: $(k_1,\theta_1), (k_2,\theta_2),...$. If we take a sample from the full population across all areas (call this the 'top-level' sample), we can fit a gamma distribution to it and get its parameters.

If, from some other source, we are told the mean value of $X$ in each geographic area, what can we say about the most likely values of the distribution's parameters for each area (call these the 'lower-level')?

I have built a simple example in Excel in which I take a sample from 5 randomly generated samples from different gamma distributions. From this, it looks like if the lower-level distributions all have the same shape parameter, then the top-level distribution also has that shape parameter and the mean of the top-level distribution is the mean of the lower-level means. If the lower-level shape parameters are different, though, then things are not so simple.

Is it ok to assume that if my top-level distribution has parameters $k,\theta$, then the best estimates of the parameters of the lower-level distributions are for them all to have $\theta$ as their shape parameter and choose $k_1, k_2, ...$ to match the lower-level means?

Is there a general result for what I am looking for?

Edit: A complicating point is that the top-level data that I have is likely to only be available after it has been binned into ranges.

JohnFrum
  • 101
  • 1
    I believe this can be solved, but a solution must depend on (a) knowing the relative sizes of the subpopulations and (b) exactly how you fit the overall Gamma distribution. The latter is important because the fit is inaccurate: you are using a Gamma distribution to describe a mixture of Gamma distributions. Unless all the component distributions have identical parameters, this mixture does not actually have a Gamma distribution. Even when it can be solved, there likely will be considerable uncertainty in the results. – whuber Nov 15 '22 at 16:36
  • Thanks. I could possibly have some information about the sizes of the subpopulations at the small area level, and I would know from which subpopulation each datum in the top-level sample came from. I have also realised that I need to update my question because the data I will have at the top-level is likely to have the values of $X$ only after they have been binned into ranges. – JohnFrum Nov 15 '22 at 16:45
  • But, you say that you know for each obs, from which subpopulation it came. Why not then estimate separately for each subpopulation? – kjetil b halvorsen Nov 20 '22 at 18:05
  • The sample made up of selected observations from each subpopulation is made up of far fewer data points that the subpopulations themselves, so for each subpopulation I might have no observations or a very small number of observations. The means of those subpopulations are being taken from another source which has calculated them using data I don't have access to. – JohnFrum Nov 21 '22 at 16:30
  • Did you get anywhere with this? I think you should try bayes ... since you do seem to have some real prior information – kjetil b halvorsen Mar 18 '24 at 20:00
  • In the end, I went about things in a different way because I am not very well-versed in statistics to know whether I would be doing things correctly. – JohnFrum Mar 20 '24 at 09:56

0 Answers0