21

We all know that element distinctness in the comparison based model cannot be done in $o(n\log n)$ time. However, on a word-RAM, one can possibly achieve better.

Of course, if one assumes the existence of a perfect hash function that can be computed in linear time, we get a linear time algorithm for element distinctness: just keep hashing the numbers one by one and return 1 if there is a collision.

However, there are two issues: 1) most constructions of perfect hash functions that I could find used randomness and 2) I cannot find a discussion of the pre-processing time anywhere, i.e., the time required to decide what hash function one is going to use based on the input set of numbers.

Fredman et al.'s "Storing a sparse table with $O(1)$ worst case access time" does resolve the first issue by providing a hash function with $O(1)$ access time in the worst case, but says nothing about the second issue.

So to sum up, here's what I want:

Design an algorithm that given a set $S$ of $n$ numbers (each number being $w$ bits long) on a word-RAM with word length $w$, finds a hash function $h:S\rightarrow \{1, \ldots , m\}$ in $O(n)$ time, where $m = O(n)$. The function $h$ should have the property that for any $j \in \{1, \ldots , m\}$, the number of elements of $S$ that map to $j$ is a constant and computing $h(i)$ should take $O(1)$ time in a "reasonable" word-RAM model, i.e., the model should not allow "exotic" functions on words to be evaluated in $O(1)$ time.

I will also like to know if there are algorithms to solve element distinctness on the word-RAM that do not use hash functions at all.

J..y B..y
  • 2,776
  • 1
  • 22
  • 35
Vinayak Pathak
  • 1,631
  • 10
  • 22
  • 8
    Re: "I will also like to know if there are algorithms to solve element distinctness on the word-RAM that do not use hash functions at all." — as long as you only want $o(n\log n)$ and not linear, there is lots of work on sorting on the word RAM (see http://en.wikipedia.org/wiki/Integer_sorting). Some of these algorithms use hashing but others do not. – David Eppstein Nov 15 '12 at 07:59
  • Are approximate solutions allowed? – A T Jan 27 '13 at 14:31
  • (I think that) Your thinking process is skipping one step:
    1. You postulate that the best complexity in the comparison model is $\Theta(n\log n)$
    2. You ask how this can be improved in the RAM model
    3. You directly ask for a solution in $O(n)$ time in the RAM model.

    Rather, you should be studying the solutions in $o(n\log n)$ in the RAM model and see if you can improve them?

    – J..y B..y Mar 24 '13 at 17:58
  • Is Radix sort too slow for you? – Thomas Mueller Dec 23 '15 at 14:43

2 Answers2

8

Element distinctness can be solved deterministically in the RAM model in time within $O(n\log\log n)\subset o(n\log n)$ time:

Sort in time within $O(n\log\log n)$ your $n$ numbers of $w$ bits using the sorting algorithm described by Han in STOC 2002 ("Deterministic sorting in $O(n\log\log n)$ time and linear space"), then scan in linear time for collisions.

As far as I know, that is the best result known to this day.

J..y B..y
  • 2,776
  • 1
  • 22
  • 35
1

That is exactly what the FKS paper you mention do - and it takes linear time (in expectation). See page 5 here for the analysis... http://www1.icsi.berkeley.edu/~luby/PAPERS/pairwise.ps

Sariel Har-Peled
  • 9,626
  • 31
  • 53