My friend and I recently saw our old passions for Kinder Surprise toys reignited with a new animal toy line which resembled the old toys we were missing. To our dismay, however, this series did not include a "check-list" of toys to be collected. Hence arose the natural problem of estimating the number of distinct toys in the series, and when to stop buying. (We have an embarrassing amount of data.)
As neither of us has much experience with this type of problem, we do not know any standard approaches. That being said, here is what we cooked up: Suppose that there are $N$ distinct toys in the series. We can calculate the probability of observing $v_1$ toys repeated once ("singles"), $v_2$ toys repeated twice ("doubles"), etc. If this were indeed our data, we assume the most likely thing happened, and maximize the calculated probability as a function of $N$. I recognize that there are problems with this approach. Could anyone suggest alternatives?
The data is sourced only from purchases in the supermarket. For fun, here is some data, presented as a quintuple with the number of singles, doubles, etc.: $(5,2,3,2,0)$ was a week ago, $(5,2,3,1,1)$ and $(5,2,2,2,1)$ are more recent.
This question is similar, except here we assume that the value labelled $S$ there is infinite.