1

I have a population of entities associated with different categories, say

blue    50000
red     300
green   80
yellow  10
pink    6
orange  3
white   2

The distribution is known up the exact counts. It is very homogeneous, i.e. if ${p_1,...,p_k}$ denotes the probabilities for each category, then $max_i(p_i)\geq0.95$.

Now I would like to choose a sample size, such that the expected number of categories $N_k$ present in the sample is $n_k$, i.e. the expected number of categories whose count is larger than zero is $n_k$. How, given a count vector as shown above and $n_k$, do I choose the sample size?

barbaz
  • 111

0 Answers0