42

As the Ethereum platform relies on the Keccak256 hash algorithm, I'd like to get a better understanding of it.

My rough understanding is something like this:

a function accepting a finite set of bits into a giant imaginary rubik's cube which is then shunted about in a specific way. A subset of 256 bits are then returned. The function has the property that a change to a single input bit causes the output to change in an unpredictable way.

Is the above approximately true? You might see where I got the rubik's cube idea from if you look at Figure 1 here (I think this is the right spec).

There's also this, which I've read through, but it has not really soaked in.

How does the Keccak256 hash function work?

Lee
  • 8,548
  • 6
  • 46
  • 80
  • 4
    This might be too broad. Found something that might help: http://www.slideshare.net/RajeevVerma14/keccakpptx (and it does have cube diagrams). – eth Jan 21 '17 at 22:45
  • I think keep it and we can let the community vote and provide feedback (there might also be ways to edit it that might improve the question but I don't know enough so haven't upvoted), or someone might write a really good answer that you're looking for. – eth Jan 21 '17 at 22:49
  • @eth I'll perhaps remove that last two questions... let's see – Lee Jan 21 '17 at 22:54
  • It's very important that the function receives an infinite set of bits and produces a finite length of bits – Daniel Luca CleanUnicorn Feb 10 '18 at 02:02
  • Perhaps try go through the pseudo-code description on their team site https://keccak.team/keccak_specs_summary.html

    there is plenty of resources and material, not very high level though.

    – Herald Feb 12 '18 at 10:38
  • @cleanunicorn should it not be an ordered set of bits of any size, in any permutation... there's probably a more succinct way of putting that. – Lee Mar 08 '18 at 10:20
  • @atomh33ls I don't know how to better express it :( – Daniel Luca CleanUnicorn Mar 09 '18 at 12:52
  • 7
    The sponge construction is the core. You divide the input into n blocks P0...Pn-1 (padding) and then starting with a block of zeros XOR the first formed block P0 and apply a permutation function f, the output of this function is passed to the next step that uses P1 and repeats until you have used all the blocks up to Pn-1. At this step, the data has been absorbed. You "squeeze" it by selecting a portion of the last state of the system and applying the function f to it until you obtain the desired number of bits in the output. This is very oversimplified but is the general idea. – Jaime Mar 29 '18 at 20:17

1 Answers1

25

Keccak is nice that it has arbitrary inputs and an infinite input space. This enables one to "make a hash" of a super large file where each input causes the internal state to scramble up some more. The hash should entirely change if a single bit of data in the source is different - unlike say a CRC32, or a checksum. It means your password could be a million chars long maybe. It's stored on disk as a hash, much smaller in size.

Regarding Keccak, it uses a "Sponge Construction" lord knows what that is read up on it here: https://keccak.team/keccak_specs_summary.html If I understand it's a permutation chosen from a set of seven Keccak permutations, denoted I assume by reference to their bit depths as b∈{25,50,100,200,400,800,1600}.

The state is organized as an array of 5×5 lanes, each of length w∈{1,2,4,8,16,32,64} and 25 cells deep. When implemented on a 64-bit processor, a lane of Keccak can be represented as a tidy 64-bit CPU word.

Finally, to even entertain the thought of similar input causing collisions, you have to imagine this data traversing from base 25, through base 50, up to 1600 and back. Smart money is on this being quite very resistant to collisions (it's design goal?).

Tomachi
  • 461
  • 6
  • 10
  • 5
    Minor comments: Not every hash has an infinite input space, many hash algorithms cannot take inputs longer than 2^64 bits. However I believe Keccak indeed allows inputs of arbitrary lengths. Second, a single input bit flip changing the entire output is also true (and important) for checksums. The difference between checksums and hashes is the reversibility. – jlh Jan 09 '22 at 14:09
  • 1
    @jlh cheers I have updated based on your comment. I think the other ones would break the 2^64 bits chunk up and keep going no? I guess we are talking filesize here interesting point. – Tomachi Mar 16 '22 at 02:38