Here is a whole paper about the problem, with a summary of various approaches. It's called Distinct Value Estimation in the literature.
If I had to do this myself, without having read fancy papers, I'd do this. In building language models, one often has to estimate the probability of observing a previously unknown word, given a bunch of text. A pretty good approach at solving this problem for language models in particular is to use the number of words that occurred exactly once, divided by the total number of tokens. It's called the Good Turing Estimate.
Let u1 be the number of values that occurred exactly once in a sample of m items.
P[new item next] ~= u1 / m.
Let u be the number of unique items in your sample of size m.
If you mistakenly assume that the 'new item next' rate didn't decrease as you got more data, then using Good Turing, you'll have
us ~= u + u1 / m * (s - m)
s - entire set size
m - sample size
us - total number of unique elements in the entire set
u - number of unique elements in the sample
u1 - number of values that occurred exactly once in the sample
This has some nasty behavior as u1 becomes really small, but that might not be a problem for you in practice.