Collision resistant functions - definition

Question

Let $f$ be a collision resistant function i.e. it is computationally impossible to find $x_0, x_1$ such that $f(x_0) = f(x_1)$. If a computationally bounded adversary demonstrates that he knows some $x_0$ and $f(x_0)$, in what sense is he unable to determine $x_1$?

My instinct is that he can do no better than choosing a random guess from the domain of $f$ but is this how collision resistance is defined? Moreover, if $f$ has a trapdoor, is the definition still the same i.e. collision resistance implies that either the adversary knows the trapdoor or he can do no better than pick a random element from the domain of $f$?

I apologize that this is a bit of a yes/no question although if the answer is no, then there is more to say!

EDIT: Alternatively, what is the mathematical statement that captures the idea that there is no efficient algorithm to find a collision?

score 2 · Accepted Answer · answered Oct 10 '19 at 21:21

2

Let the range $Y$ of the function $f$ have size $N$ A collision resistant function cannot be better than an ideal function which maps uniformly into $Y$ since collision probability is minimized by the uniform distribution. Thus, given $(x_0,f(x_0))$ the number $k$ of random points $x_i\neq x_0$ that must be tried before the probability that $f(x_i)=f(x_0)$ is significant, say at least $1/2$ is $k\approx \sqrt{\ln 2 N}$ by the birthday paradox.

If necessary see @fgrieu's detailed post on the mathematics of the birthday paradox here.

answered Oct 10 '19 at 21:21

kodlu

22,423
2
27
57

Thank you. Could you also comment on the following: When one says there is no efficient algorithm to find a collision, what is the mathematical definition this corresponds to? – user1936752 Oct 11 '19 at 10:06
1

Let the output space have bitlength $b,$ hence $b=\log_2 N.$ The algorithm in the answer needs approximately $\sqrt{N}=2^{b/2}$ queries before finding a collision with probability $1/2.$ This quantity is exponential in the input size (bitlength) $b,$ hence inefficient. If it could be found with $b^c$ trials, for constant $c$, it would have polynomial complexity in the bitlength, hence efficient. – kodlu Oct 11 '19 at 12:18
Ah I see - so no matter what algorithmthe attacker uses, the complexity is the same as a random guess attack? Thank you! – user1936752 Oct 11 '19 at 12:29
Please keep in mind that if the adversary has full control about the definition of f and the choose of $x_0$ and $x_i$ this requirement is very hard to fulfill since the adversary can define any two $x_0$ and $x_1$ such that $h(x_0) = h(x_1)$. There has to be a mechanism which makes it hard for the adversary to define the functions f like that. – Martin Kromm Oct 11 '19 at 13:48
@MartinKromm, I can choose $f$ instead of the adversary and then this is fine, no? The point is then that $x_1$ is indistinguishable from a random string (in the domain of $f$) for the adversary. – user1936752 Oct 11 '19 at 14:43
The problem is that it is hard to formalize the problem that an adversary doesn't 'know' two $x_0$ and $x_1$ such that $h(x_0) = h(x_1)$ unless there is some secret or randomness involved, and a single hash function is (usually) deterministic. – Martin Kromm Oct 12 '19 at 13:10

score 1 · Answer 2 · answered Oct 11 '19 at 12:58

Defining a hash function formally is not an easy task. There are even entire papers covering that formalization, especially if there is a requirement that the function should be unkeyed (see for the keyed variant below).

The most popular variant to formalize hash functions are the use of choosing a hash function chosen from a family of hash functions HashFamily(KeyGen, Hash):

Given security parameter $1^n$, algorithm KeyGen outputs a key k.
Given key k, a family of hashes $H := \{h_0, ..., h_{2^n - 1}\}$ and a message m, algorithm Hash outputs $h_k(m)$ with $h_k \in H$.

Now consider the following game Col:

The adversary A is given H and a random key k.
The adversary A outputs $m_1$ and $m_2$.
The adversary A wins if $h_k(m_1) = h_k(m_2)$ and $m_1 \neq m_2$.

Collision resistant means that $\forall PPT A: Pr[Col(A) = 1] \leq negl(n)$.

The above game implies that the adversary A can compute $Hash(k, H, \cdot)$ by himself. However, since the adversary is computationally bounded, he can only query a limited number of messages (if the adversary is unbounded, then the adversary always wins the game unless $h_\cdot$ is injective of course).

PPT A means probabilistic polynomial time algorithm. That means that given some randomness r, algorithm A computes x using any strategy in $O(n^z)$ (with z being constant and independent of the input size).

I would also recommend you reading Introduction to Modern Cryptography by Katz & Lindell. There is the topic explained in more detail.

Collision resistant functions - definition

2 Answers2