1

I have a large list of $n$ distinct numbers drawn from a discrete uniform distribution of integers in $[0, M[$, with $n \ll M$. $M$ is known, but $n$ isn't. It's too expensive to read all $n$ numbers, but I can read the $k$ smallest numbers in the list for some $k \ll n$. I want to estimate $n$ from this sample.

Let's call those $k$ smallest numbers $x_1, ..., x_k$ with $x_1$ being the smallest and $x_k$ the largest. This implies that exactly $k$ numbers out of $n$ are less than or equal to $x_k$.

My first approach was to estimate $n \approx k \frac{M}{x_k}$. This (kind of) works, but in simulations with $n = 1000000$, the margin of error was larger than I'd like.

I then tried to think of the sample as a binomial distribution, with $p = \frac{x_k}{M}$ and an unknown $n$. My question is, how can I estimate $n$ knowing $k$ and $p$? Or, is there a better way to look at this problem?

jfhr
  • 111

0 Answers0