Number of ten millionaires in a country

Question

Given that I know the number of billionaires B and millionaires M in a society and that wealth is distributed in as a Pareto, how do I figure out the number of people with more than 10 million dollars?

You are doing this as an exercise or want to use this number for some purpose? — Tim, Jan 08 '21 at 10:40
@Tim, this is actually personal curiosity given that I have a source that lists B and M for real-world countries. — Joshua Snider, Jan 12 '21 at 10:07

score 1 · Accepted Answer · answered Jan 08 '21 at 17:01

Say that $N$ is the total population size, $n_1$ and $n_2$ are the numbers of millionaires and billionaires respectively. The $n_i$ counts are cumulative, so "number of millionaires" is the number of people who own \$1,000,000 or more. In such case, we can calculate $p_i = \Pr(n \ge n_i) = n_i / N$.

Given the survival function of a single parameter Pareto distribution, parametrized by shape $\alpha$

$$ S_\alpha(x) = 1 - F_\alpha(x) $$

where $x$ is the wealth, we can use any curve fitting algorithm to fit this distribution to the data. For example, you could minimize squared distance between the predicted probabilities $\sum_i (S_\alpha(x_i) - p_i)^2$, this is illustrated using the Julia code below. In the example, I'm using numbers from Google for the number of millionaires, billionaires, and population size for the USA for the 2020 year.

using Distributions
using Optim
x = [1_000_000, 1_000_000_000]
n = [18_600_000, 607]
N = 330_052_960
p = n / N
sumsq(predicted, observed) = sum((predicted .- observed).^2)
err(α) = sumsq(ccdf(Pareto(α), x), p)
res = optimize(err, 1e-6, 10)
α = Optim.minimizer(res)

As you can see, the predicted probabilities are "quite close":

julia> (1 .- p)
2-element Array{Float64,1}:
 0.9436454076945712
 0.9999981609012081
julia> cdf(Pareto(α), x)
2-element Array{Float64,1}:
 0.9477412436290724
 0.9880535572673685

But if you look at the counts, they are off by 1M for millionaires and 4M for billionaires.

julia> floor.(Integer, p * N)
2-element Array{Int64,1}:
 18600000
      607
julia> floor.(Integer, ccdf(Pareto(α), x) * N)
2-element Array{Int64,1}:
 17248157
  3942958
julia> floor.(Integer, n .- ccdf(Pareto(α), x) * N)
2-element Array{Int64,1}:
  1351842
 -3942352

There are two problems. First, Pareto, as far as I know, is fitted rather to rank of an individual, rather than their wealth. This is what Klass et al (2006) do, and this is what Wikipedia mentions

The Pareto Distribution has often been used to mathematically quantify the distribution of wealth at the right tail (the wealth of very rich). In fact, the tail of wealth distributions, similar to that of income distribution, behaves like a Pareto distribution but with a thicker tail.

As you could see from the example above, the "thicker tail" in terms of distributions of extreme values can mean a huge difference. Those quantities are approximately distributed as Pareto, so those numbers will be off.

Second, what a statistician might point out, you have only two datapoints in here. This is an extremely small sample to get a reliable estimate for the parameters of the distribution. What this means, is that the validity of the estimate would be questionable.

Klass, O. S., Biham, O., Levy, M., Malcai, O., & Solomon, S. (2006). The Forbes 400 and the Pareto wealth distribution. Economics Letters, 90(2), 290–295. doi:10.1016/j.econlet.2005.08.020

Number of ten millionaires in a country

1 Answers1