2

Let $f$ be a collision resistant function i.e. it is computationally impossible to find $x_0, x_1$ such that $f(x_0) = f(x_1)$. If a computationally bounded adversary demonstrates that he knows some $x_0$ and $f(x_0)$, in what sense is he unable to determine $x_1$?

My instinct is that he can do no better than choosing a random guess from the domain of $f$ but is this how collision resistance is defined? Moreover, if $f$ has a trapdoor, is the definition still the same i.e. collision resistance implies that either the adversary knows the trapdoor or he can do no better than pick a random element from the domain of $f$?

I apologize that this is a bit of a yes/no question although if the answer is no, then there is more to say!

EDIT: Alternatively, what is the mathematical statement that captures the idea that there is no efficient algorithm to find a collision?

user1936752
  • 737
  • 1
  • 4
  • 16

2 Answers2

2

Let the range $Y$ of the function $f$ have size $N$ A collision resistant function cannot be better than an ideal function which maps uniformly into $Y$ since collision probability is minimized by the uniform distribution. Thus, given $(x_0,f(x_0))$ the number $k$ of random points $x_i\neq x_0$ that must be tried before the probability that $f(x_i)=f(x_0)$ is significant, say at least $1/2$ is $k\approx \sqrt{\ln 2 N}$ by the birthday paradox.

If necessary see @fgrieu's detailed post on the mathematics of the birthday paradox here.

kodlu
  • 22,423
  • 2
  • 27
  • 57
  • Thank you. Could you also comment on the following: When one says there is no efficient algorithm to find a collision, what is the mathematical definition this corresponds to? – user1936752 Oct 11 '19 at 10:06
  • 1
    Let the output space have bitlength $b,$ hence $b=\log_2 N.$ The algorithm in the answer needs approximately $\sqrt{N}=2^{b/2}$ queries before finding a collision with probability $1/2.$ This quantity is exponential in the input size (bitlength) $b,$ hence inefficient. If it could be found with $b^c$ trials, for constant $c$, it would have polynomial complexity in the bitlength, hence efficient. – kodlu Oct 11 '19 at 12:18
  • Ah I see - so no matter what algorithmthe attacker uses, the complexity is the same as a random guess attack? Thank you! – user1936752 Oct 11 '19 at 12:29
  • Please keep in mind that if the adversary has full control about the definition of f and the choose of $x_0$ and $x_i$ this requirement is very hard to fulfill since the adversary can define any two $x_0$ and $x_1$ such that $h(x_0) = h(x_1)$. There has to be a mechanism which makes it hard for the adversary to define the functions f like that. – Martin Kromm Oct 11 '19 at 13:48
  • @MartinKromm, I can choose $f$ instead of the adversary and then this is fine, no? The point is then that $x_1$ is indistinguishable from a random string (in the domain of $f$) for the adversary. – user1936752 Oct 11 '19 at 14:43
  • The problem is that it is hard to formalize the problem that an adversary doesn't 'know' two $x_0$ and $x_1$ such that $h(x_0) = h(x_1)$ unless there is some secret or randomness involved, and a single hash function is (usually) deterministic. – Martin Kromm Oct 12 '19 at 13:10
1

Defining a hash function formally is not an easy task. There are even entire papers covering that formalization, especially if there is a requirement that the function should be unkeyed (see for the keyed variant below).

The most popular variant to formalize hash functions are the use of choosing a hash function chosen from a family of hash functions HashFamily(KeyGen, Hash):

  • Given security parameter $1^n$, algorithm KeyGen outputs a key k.
  • Given key k, a family of hashes $H := \{h_0, ..., h_{2^n - 1}\}$ and a message m, algorithm Hash outputs $h_k(m)$ with $h_k \in H$.

Now consider the following game Col:

  • The adversary A is given H and a random key k.
  • The adversary A outputs $m_1$ and $m_2$.
  • The adversary A wins if $h_k(m_1) = h_k(m_2)$ and $m_1 \neq m_2$.

Collision resistant means that $\forall PPT A: Pr[Col(A) = 1] \leq negl(n)$.

The above game implies that the adversary A can compute $Hash(k, H, \cdot)$ by himself. However, since the adversary is computationally bounded, he can only query a limited number of messages (if the adversary is unbounded, then the adversary always wins the game unless $h_\cdot$ is injective of course).

PPT A means probabilistic polynomial time algorithm. That means that given some randomness r, algorithm A computes x using any strategy in $O(n^z)$ (with z being constant and independent of the input size).

I would also recommend you reading Introduction to Modern Cryptography by Katz & Lindell. There is the topic explained in more detail.

Martin Kromm
  • 407
  • 2
  • 8