Guessing the size of a set based on number of repeated random draws

Question

I am trying to study a problem in algebraic number theory through a set of computational experiments. I have an enormous (say, of size $X$) family $\mathcal{F}$ of polynomials and I'm trying to understand how many distinct values a certain function $g$ takes over this family.

To experiment, I randomly sample $N$ (say $10^5$ at a time) polynomials from the $\mathcal{F}$ and compute $g(f)$ for each polynomial $f$ in my sample. I keep a list of each of these outputs $g(f)$ and count how many times each occurs. I can repeat this sampling and tallying process several times.

If I assume that $g(f)$ are uniformly distributed over the (finite) possible range of values, I should be able to guess the size of the image, $\# g(\mathcal{F})$ based on how many repetitions I see.

Is there a simple way to do this?

The inverse question is standard. That is, if I know that $\# g(\mathcal{F}) = Y$ and I take $N$ random draws with repetition from $Y$, then the expected number of repetitions can be obtained from a binomial distribution, as in this other question.

One approach for me would be to iteratively guess various $Y$ and see what looks close. But I feel like there must be a better way (and I have a lot to learn about fundamental statistical questions).

Seems very similar to this question: https://stats.stackexchange.com/questions/587809/how-to-guess-the-size-of-a-set/587966#587966 — jblood94, Feb 10 '24 at 00:49
This is basically mark and recapture methodology. https://en.wikipedia.org/wiki/Mark_and_recapture See also the hypergeometric distribution. — Glen_b, Feb 10 '24 at 02:03
@Glen_b Yes, it turns out this is mark and recapture! Thank you for teaching me a new phrase. — davidlowryduda, Feb 10 '24 at 05:26

Guessing the size of a set based on number of repeated random draws

0 Answers0