1

Disclaimer. I have asked a similar question on Electronics StackExchange: they advised me to try here.

The question

I am in the process of developing a simple protocol whose security relies on the assumption that computing the SHA256 hash of a very long string of bytes x takes a long amount of time. Intuitively, even a massively parallel adversary can only compute SHA256(x) by successively feeding each chunk of x into SHA256, which takes a long time.

Now, in implementing my protocol in the real world, I need to put constants to this intuition. Which leads me to the question:

How quickly can we compute the SHA256 hash of a long string x?

More precisely, let us assume that the length of x is, e.g., 1 GB. Given our current / foreseeable level of technological development (this includes both algorithms and hardware), how quickly (in seconds) can an adversary realistically compute SHA256(x)?

To the best of my current understanding, ASICs represent probably the answer to my question: by designing application-specific circuitry, one can speed up even computations that are not easy to parallelise, e.g., on a CPU. However, I am not aware of how quick ASICs can get. This is why I originally asked this question on Electronics StackExchange.

More details

  • I am aware that this question has been previously asked on Crypto StackExchange here. However, that question is three years old, and technology advances constantly. More importantly, that question does not seem to address ASICs! The only answers there concern CPUs for SHA256, or ASICs for SHA3 (which is actually designed, as far as I understand, to be sped up with dedicated hardware).

  • The protocol I am working on is a super-simple design for Proofs of Space: one plots some large portion of its memory, others can quickly verify that the plot is correct (with high probability). My scheme relies on repeated hashing to plot, so what I am looking for is a hash function that satisfies the following:

    • As fast as possible: the faster the hash function, the faster and simpler the plotting.
    • As egalitarian as possible: the speed ratio between the fastest possible implementation of the hash function and standard, CPU-based, market-available implementations should be not much larger than $1$

    This is why I am inquiring about SHA256: it is reasonably fast, and I know that it is already accelerated in standard x86 CPUs, which (I suppose) can reduce the gap between CPU-based and ASIC-based implementations.

Matteo Monti
  • 1,407
  • 2
  • 14
  • 19
  • 1
    You need to use Argon2, the passwords hashing algorithms are designed to have memory-hard and iteration parameter to reduce the ASIC/FPGA/GPU massive parallelization and the iteration makes them slower like 100K times. – kelalaka Feb 09 '21 at 20:46
  • 1
    For reference: Modern x86 CPUs hit 2-7 CPU cycles per byte. Dedicated ASICs will probably be able to push that down a bit more and mostly can massively increase parallelism easily. – SEJPM Feb 09 '21 at 20:55
  • 14nm https://ieeexplore.ieee.org/document/9061457 – kelalaka Feb 09 '21 at 20:56
  • @SEJPM This is already a great pointer! By "massively increase parallelism" you mean that an ASIC can compute many many independent instances of SHA256 in parallel, right? Like a Bitcoin miner does? That I knew, what I am worried is how much an ASIC can speed up a sequential SHA256 computation! – Matteo Monti Feb 09 '21 at 20:57
  • @kelalaka I'm sorry, is the paper you linked about parallel SHA256 computation? I'm interested in how quickly we can compute the SHA256 hash of a single, long string! – Matteo Monti Feb 09 '21 at 21:00
  • Multi-computation of SHA-256 is working in parallel pipelines, indicating that the computation capacity can be 3 times of that with standard SHA-256 implementation. – kelalaka Feb 09 '21 at 21:02
  • 1
    Usually, when people do things like hash long inputs to force serial effort they do so to slow down parallelizable guessing / brute-force and hinder ASICs from eg guessing passwords quickly. If this is the problem you are trying to solve, using something like Argon2 will most likely be a better solution. If however your security relies on hitting a low latency for computations, as in a race, there is a chance that this is optimal, but depending on the use-case Argon2 may still be better. So what problem are you actually trying to solve for which you think long-input hashing is ideal? – SEJPM Feb 09 '21 at 21:34
  • A related question (which might be a dup): https://crypto.stackexchange.com/questions/50618/how-fast-can-a-sha-256-implementation-go – poncho Feb 09 '21 at 22:32
  • Thank you for the feedback! I'll add some more info to my question :) – Matteo Monti Feb 10 '21 at 09:02
  • 2
    Worth noting: computing the SHA-256 of 1 GB data as fast as possible (as asked) requires technology quite different from that of ASICs doing SHA-256 for bitcoin mining, for two reasons: (A) the task can't be parallelized, so we need a single compute unit with ultra-fast clock, rather than many units optimized for performance vs costs (of building and operating). (B) bringing the 1 GB of data to the compute unit is a serious performance bottleneck, and will often be the dominating one on a normal CPU with SHA-256 hardware (the data hashed by bitcoin mining is generated locally, solving that). – fgrieu Feb 10 '21 at 09:50
  • 2
    @fgrieu Even cheaper DDR4-SDRAM manages to hit ~12GB/s per memory channel which is way more than even the most optimistic single-threaded speed of 2.5GB/s on a modern x86 CPU for linear SHA256 hashing (per core), so on regular CPUs (assuming you batch + prefetch appropriately) this is not an issue. Dedicated ASICs can just request a steady stream from DDR4 modules. – SEJPM Feb 10 '21 at 10:14
  • 1
    @SEJPM Very right, with data already in RAM, and I should have done that math properly. On the other hand the question seems to be about a protocol, and while my ISP advertises 0.125 Gbyte/s, I call for a party when I get half that (which does happen when downloading thru P2P something particularly well seeded). – fgrieu Feb 10 '21 at 10:38
  • 1
    @SEJPM Dedicated ASICs might not even use cycles. – forest Feb 14 '21 at 03:30

0 Answers0