Element distinctness in O(n) time?

Question

We all know that element distinctness in the comparison based model cannot be done in $o(n\log n)$ time. However, on a word-RAM, one can possibly achieve better.

Of course, if one assumes the existence of a perfect hash function that can be computed in linear time, we get a linear time algorithm for element distinctness: just keep hashing the numbers one by one and return 1 if there is a collision.

However, there are two issues: 1) most constructions of perfect hash functions that I could find used randomness and 2) I cannot find a discussion of the pre-processing time anywhere, i.e., the time required to decide what hash function one is going to use based on the input set of numbers.

Fredman et al.'s "Storing a sparse table with $O(1)$ worst case access time" does resolve the first issue by providing a hash function with $O(1)$ access time in the worst case, but says nothing about the second issue.

So to sum up, here's what I want:

Design an algorithm that given a set $S$ of $n$ numbers (each number being $w$ bits long) on a word-RAM with word length $w$, finds a hash function $h:S\rightarrow \{1, \ldots , m\}$ in $O(n)$ time, where $m = O(n)$. The function $h$ should have the property that for any $j \in \{1, \ldots , m\}$, the number of elements of $S$ that map to $j$ is a constant and computing $h(i)$ should take $O(1)$ time in a "reasonable" word-RAM model, i.e., the model should not allow "exotic" functions on words to be evaluated in $O(1)$ time.

I will also like to know if there are algorithms to solve element distinctness on the word-RAM that do not use hash functions at all.

Re: "I will also like to know if there are algorithms to solve element distinctness on the word-RAM that do not use hash functions at all." — as long as you only want $o(n\log n)$ and not linear, there is lots of work on sorting on the word RAM (see http://en.wikipedia.org/wiki/Integer_sorting). Some of these algorithms use hashing but others do not. — David Eppstein, Nov 15 '12 at 07:59
(I think that) Your thinking process is skipping one step:

You postulate that the best complexity in the comparison model is $\Theta(n\log n)$

You ask how this can be improved in the RAM model

You directly ask for a solution in $O(n)$ time in the RAM model.

Rather, you should be studying the solutions in $o(n\log n)$ in the RAM model and see if you can improve them? — J..y B..y, Mar 24 '13 at 17:58

score 8 · Answer 1 · answered Mar 24 '13 at 17:58

Element distinctness can be solved deterministically in the RAM model in time within $O(n\log\log n)\subset o(n\log n)$ time:

Sort in time within $O(n\log\log n)$ your $n$ numbers of $w$ bits using the sorting algorithm described by Han in STOC 2002 ("Deterministic sorting in $O(n\log\log n)$ time and linear space"), then scan in linear time for collisions.

As far as I know, that is the best result known to this day.

score 1 · Answer 2 · answered Mar 25 '13 at 03:05

1

That is exactly what the FKS paper you mention do - and it takes linear time (in expectation). See page 5 here for the analysis... http://www1.icsi.berkeley.edu/~luby/PAPERS/pairwise.ps

answered Mar 25 '13 at 03:05

Sariel Har-Peled

9,626
31
53

Element distinctness in O(n) time?

2 Answers2