How can sub-logarithmic magic state distillation be possible?

Question

In "Distillation with sublogarithmic overhead", Hastings and Haah present a magic state distillation factory with a yield parameter $\gamma \approx 0.7$. Meaning that the number of initial fixed-error-rate magic states they need, in order to reach a target error rate $\epsilon$, grows like $O(\left(\log \frac{1}{\epsilon})^{0.7}\right)$.

How can this be possible? It seems like if you have fewer than $O(\log \frac{1}{\epsilon})$ input states, then the chance of them all failing simultaneously will be larger than $\epsilon$. But you can't possibly be getting a correct result if all of your inputs are bad, so it shouldn't be possible to do better than $\Omega(\log \frac{1}{\epsilon})$.

For example, suppose the fixed initial error rate is $f = 10^{-10}$ and the target error rate is $\epsilon = 10^{-10^{10}}$ and the constant factor hidden by the O notation is such that the base of the logarithm is $e$. So we start with $(\ln \frac{1}{\epsilon})^{0.7} \approx 2 \cdot 10^7$ states. The chance of them all failing simultaneously is $f^{2 \cdot 10^7} \approx 10^{-2 \cdot 10^8}$. This failure rate is higher than the target failure rate. It seems like for any choice of constant factor and initial error rate, I can find a target error rate demanding enough that this problem will occur. Why is this kind of thing not fatal to the construction?

@NorbertSchuch it is very much on the "cool theory" side of things, rather than "let's use it" side of things. — Craig Gidney, Mar 14 '24 at 18:55

score 3 · Accepted Answer · answered Mar 14 '24 at 18:32

I forgot to consider the fact that $\gamma$ can be reduced by increasing the number of outputs. The number of inputs does have to be at least $\Omega(\log \frac{1}{\epsilon})$, but the number of outputs can be large enough that the ratio of inputs per output is asymptotically less than that.

Consider that this same "all inputs fail at the same time" reasoning, applied to classical error correction of a communication channel, would imply that an input message must be made $O(\lg \frac{1}{\epsilon})$ times larger in order to reach a target communication error rate of $\epsilon$. But actually there's no dependence between the message size and the target error rate. The bandwidth of a channel depends on the initial error rate, not the target error rate. That's Shannon's theorem. Because you can use bigger and bigger block codes to amortize away the costs of reaching the target error rate.

How can sub-logarithmic magic state distillation be possible?

1 Answers1