31

Wikipedia defines a second preimage attack as:

given a fixed message m1, find a different message m2 such that hash(m2) = hash(m1).

Wikipedia defines a collision attack as:

find two arbitrary different messages m1 and m2 such that hash(m1) = hash(m2).

The only difference that I can see is that in a second preimage attack, m1 already exists and is known to the attacker. However, that doesn't strike me as being significant - the end goal is still to find two messages that produce the same hash.

What are the essential differences in how a second preimage attack and collision attack are carried out? What are the differences in results?

(As an aside, I can't tag this question properly. I'm trying to apply the tags "cryptography security pre-image collision" but I don't have enough reputation. Can someone apply the appropriate tags?)

Jukka Suomela
  • 11,500
  • 2
  • 53
  • 116
Thomas Owens
  • 413
  • 1
  • 4
  • 9
  • 1
    Your impression is correct -- that is the difference. The part where you're incorrect is that this is a HUGE difference in practice. It's one thing to be able to find any two things which have a collision, and quite another to be able to find a collision for a specific plaintext. For example, if you want to spoof a particular message, it is useless to be able to commit existential forgery -- you need another message that hashes to the same thing as the message you've intercepted. – Adrian Petrescu Aug 25 '10 at 20:36
  • @Adrian Petrescu: Could you make that an answer and perhaps elaborate on it some more? Add things such as when each (you provide a situation for preimage attack, but not for collision attack) is best suited? – Thomas Owens Aug 25 '10 at 20:38

3 Answers3

33

I can motivate the difference for you with attack scenarios.

In a first preimage attack, we ask an adversary, given only $H(m)$, to find $m$ or some $m'$ such that $H(m')$ = $H(m)$. Suppose a website stores $\{username, H(password)\}$ in its databases instead of $\{username, password\}$. The website can still verify the authenticity of the user by accepting their password and comparing $H(input) =? H(password)$ (with probability of $1/2^n$ for some large $n$ for false positives). Now suppose this database is leaked or is otherwise comprimised. A first preimage attack is the situation where an adversary only has access to a message digest and is trying to generate a message that hashes to this value.

In a second preimage attack, we allow the adversary more information. Specifically, not only do we give him $H(m)$ but also give him $m$. Consider the hash function $H(m) = m^d \mod{pq}$ where $p$ and $q$ are large primes and $d$ is a public constant. Obviously for a first preimage attack this becomes the RSA problem and is believed to be hard. However, in the case of the second preimage attack finding a collision becomes easy. If one sets $m' = mpq + m$, $H(mpq + m) = (mpq + m)^d \mod{pq} = m^d \mod{pq}$. And so the adversary has found a collision with little to no computation.

We would like one way hash functions to be resistant to second preimage attacks because of digital signature schemes, in which case $H(document)$ is considered public information and is passed along (through a level of indirection) with every copy of the document. Here an attacker has access to both $document$ and $H(document)$. If the attacker can come up with a variation on the original document (or an entirely new message) $d'$ such that $H(d') = H(document)$ he could publish his document as though he were the original signer.

A collision attack allows the adversary even more opportunity. In this scheme, we ask the adversary (can I call him Bob?) to find any two messages $m_1$ and $m_2$ such that $H(m_1) = H(m_2)$. Due to the pigeonhole principle and the birthday paradox, even 'perfect' hash functions are quadratically weaker to collision attacks than preimage attacks. In other words, given an unpredictable and irreversible message digest function $f(\{0,1\}^*) = \{0,1\}^n$ which takes $O(2^n)$ time to brute force, a collision can always be found in expected time $O(sqrt(2^n)) = O(2^{n/2})$.

Bob can use a collision attack to his advantage in many ways. Here is one of the simpliest: Bob finds a collision between two binaries $b$ and $b'$ ($H(b) = H(b')$) such that b is a valid Microsoft Windows security patch and $b'$ is malware. (Bob works for Windows). Bob sends his security patch up the chain of command, where behind a vault they sign the code and ship the binary to Windows users around the world to fix a flaw. Bob can now contact and infect all Windows computers around the world with $b'$ and the signature that Microsoft computed for $b$. Beyond these sorts of attack scenarios, if a hash function is believed to be collision resistant, that hash function is also more likely to be preimage resistant.

Ross Snider
  • 2,035
  • 2
  • 20
  • 33
  • 1
    That's beautifully explained. A lot more math than I was looking for, but I very much appreciate the effort - I followed you right through each one. Thanks. – Thomas Owens Aug 25 '10 at 22:41
  • And wow. A fellow RIT student. – Thomas Owens Aug 25 '10 at 22:42
  • 1
    How's it going Thomas? I think you had Physics with my friend Alan Meekins. Good to see RIT people here! Also, thank you for accepting the answer. – Ross Snider Aug 25 '10 at 22:48
  • Pretty good. If you are going to be around campus in the Fall and are interested in security, perhaps we can catch up in person. I've been doing some applied security work (applying stenography, steganalysis, public key encryption, digital signatures) this summer and would love to hear about the theoretical side (as much as I'm interested in it - I don't have the time or mathematical background to get through a lot of the papers on the subject). – Thomas Owens Aug 25 '10 at 22:56
  • rws1236@cs.rit.edu – Ross Snider Aug 25 '10 at 23:02
  • @RossSnider I don't understand. In a second pre-image attack, if the attacker already has m, as you state, why can't he stop there? What more does he need? He has the message! Isn't that the goal of an attack? – CodyBugstein Oct 26 '15 at 18:22
  • @Imray The reason is for document signing basically. Let me give another example a little more concrete then Ross's. In BitTorrent the payload is broken up into "pieces" each of which is hashed with the hashes stored in the torrent file. Now you get some data from a peer that claims to be one of these pieces. You hash it and compare it with the value you have, if they match you accept the data. All is good so long as the hash function is second preimage secure. If the hash function is not second preimage secure then an attacker could come up with incorrect data corrupting your download file. – CrazyCasta Feb 21 '16 at 20:54
2

Collision attacks may be much easier, but if successful, much less useful.

Jukka Suomela
  • 11,500
  • 2
  • 53
  • 116
1

The problem that Ross mentions as being the discrete log problem is in reality an altogether different problem, the RSA problem, which is much more related to computing roots than to discrete log.

  • 3
    This is certainly true! Oops. Originally I used the discrete log problem and later editted the details of the scheme. Good catch. Not sure that this consitutes a new answer - it was probably more appropriate to leave as a comment under my answer. – Ross Snider Aug 30 '10 at 23:21