Given that I know the number of billionaires B and millionaires M in a society and that wealth is distributed in as a Pareto, how do I figure out the number of people with more than 10 million dollars?
-
4You are doing this as an exercise or want to use this number for some purpose? – Tim Jan 08 '21 at 10:40
-
@Tim, this is actually personal curiosity given that I have a source that lists B and M for real-world countries. – Joshua Snider Jan 12 '21 at 10:07
1 Answers
Say that $N$ is the total population size, $n_1$ and $n_2$ are the numbers of millionaires and billionaires respectively. The $n_i$ counts are cumulative, so "number of millionaires" is the number of people who own \$1,000,000 or more. In such case, we can calculate $p_i = \Pr(n \ge n_i) = n_i / N$.
Given the survival function of a single parameter Pareto distribution, parametrized by shape $\alpha$
$$ S_\alpha(x) = 1 - F_\alpha(x) $$
where $x$ is the wealth, we can use any curve fitting algorithm to fit this distribution to the data. For example, you could minimize squared distance between the predicted probabilities $\sum_i (S_\alpha(x_i) - p_i)^2$, this is illustrated using the Julia code below. In the example, I'm using numbers from Google for the number of millionaires, billionaires, and population size for the USA for the 2020 year.
using Distributions
using Optim
x = [1_000_000, 1_000_000_000]
n = [18_600_000, 607]
N = 330_052_960
p = n / N
sumsq(predicted, observed) = sum((predicted .- observed).^2)
err(α) = sumsq(ccdf(Pareto(α), x), p)
res = optimize(err, 1e-6, 10)
α = Optim.minimizer(res)
As you can see, the predicted probabilities are "quite close":
julia> (1 .- p)
2-element Array{Float64,1}:
0.9436454076945712
0.9999981609012081
julia> cdf(Pareto(α), x)
2-element Array{Float64,1}:
0.9477412436290724
0.9880535572673685
But if you look at the counts, they are off by 1M for millionaires and 4M for billionaires.
julia> floor.(Integer, p * N)
2-element Array{Int64,1}:
18600000
607
julia> floor.(Integer, ccdf(Pareto(α), x) * N)
2-element Array{Int64,1}:
17248157
3942958
julia> floor.(Integer, n .- ccdf(Pareto(α), x) * N)
2-element Array{Int64,1}:
1351842
-3942352
There are two problems. First, Pareto, as far as I know, is fitted rather to rank of an individual, rather than their wealth. This is what Klass et al (2006) do, and this is what Wikipedia mentions
The Pareto Distribution has often been used to mathematically quantify the distribution of wealth at the right tail (the wealth of very rich). In fact, the tail of wealth distributions, similar to that of income distribution, behaves like a Pareto distribution but with a thicker tail.
As you could see from the example above, the "thicker tail" in terms of distributions of extreme values can mean a huge difference. Those quantities are approximately distributed as Pareto, so those numbers will be off.
Second, what a statistician might point out, you have only two datapoints in here. This is an extremely small sample to get a reliable estimate for the parameters of the distribution. What this means, is that the validity of the estimate would be questionable.
Klass, O. S., Biham, O., Levy, M., Malcai, O., & Solomon, S. (2006). The Forbes 400 and the Pareto wealth distribution. Economics Letters, 90(2), 290–295. doi:10.1016/j.econlet.2005.08.020
- 138,066