6

How should I estimate the number of unique elements in a randomly shuffled list? I'm in a situation when the memory I have is much smaller than memory needed to store all unique elements. Something efficient with confidence interval and/or a reference for such procedure would be great

Yaroslav Bulatov
  • 6,199
  • 2
  • 28
  • 42

1 Answers1

3

Fellow CVer @rrenaud cited this paper as a key reference for number of unique value estimation. He suggested also to check out the Good Turing frequency estimator, which was developed to estimate the proportion of elements that occur n times, including the case where n = 1 (i.e., unique values).

Here is a link to @rrenaud's answer to the similar question.

Have fun!