1

I have the 3 quartiles and the mean of the incomes of a population, I would like to get from those a "regular" distribution, there might be several theoretical solutions but I just need a plausible output, it's not academic work, I need to be pragmatic.

I saw this question : Build a (normal?) distribution from n, quartiles and mean? but the answers are focused on normal distributions, mine are not normal at all.

An example might make it clearer:

Q1 = 17965
Q2 = 27401
Q3 = 44607
avg = 36773
  • At what percentile would sit an income of 50000 ?
  • what is the 90th percentile ?

I'm working with R, in case you know a function that could help or would be kind enough to point me to how to build one.

2 Answers2

1

We can fit this data with an equal mixture of the uniform distributions on $[0, 17965]$, $[17965, 27401]$, $[27401, 44607]$, and $[44607, \max]$. To get the right average, the max should be $114238$.

In that distribution, the 90th percentile is $44607 + (15/25)(144238-44607) = 104385.6$, and $$P(50000)=75\% + \frac{50000-44607}{114238-44607}25\%\simeq 77\%.$$

This is probably the simplest distribution that fits the data, so it's a reasonable starting point for further refinements.

Matt F.
  • 4,726
0

With the information available this is going to be a guessing game.

My attempt: I am assuming a unimodal continuous distribution, in my case I selected a log-normal distribution, which is skewed to the right, which is what your data is showing, since the mean is greater than the median.

After you have selected your desired distribution, you can use MLE to estimate it's parameters (!?), as long as you have $\leq 2$ parameters. Since you will use a known distribution you can use all the standard equations and formulas to answer your questions.

> x=c(17965,27401,44607)
>
> LL=function(m,s){
>   R=dlnorm(x,m,s)
>   return(-sum(log(R)))
> }
>
> library(stats4)
> mle(LL,start=list(m=0,s=1),method="L-BFGS-B",lower=c(-Inf,0.1),upper=c(Inf,Inf))

Call:
mle(minuslogl = LL, start = list(m = 0, s = 1), method = "L-BFGS-B", 
    lower = c(-Inf, 0.1), upper = c(Inf, Inf))

Coefficients:
         m          s 
10.2400541  0.3716075 

Which results in a log-normal distribution with quantile values

> qlnorm(c(0.25,0.5,0.75),10.2400541,0.3716075)
[1] 21794.41 28002.64 35979.32

Close enough?

50000 is at the quantile

> plnorm(50000,10.2400541,0.3716075)
[1] 0.9406253

90th percentile is

> qlnorm(0.9,10.2400541,0.3716075)
[1] 45084.25
user2974951
  • 7,813