The question arises in a cryptographic context involving a regulatory test of a physical source or random bits, with null hypothesis that they are independent and unbiased. $n$ samples of 4 bits are drawn ($n=128$ or $80$), the number of samples $O_i$ in each of the 16 bins is counted, and the source assumed defective if $$65.0<\sum\frac{(O_i-n/16)^2}{n/16}$$
The regulation-endorsed [KS2011] A proposal for: Functionality classes for random number generators, version 2.0, item 408, gives a false-error rate of $3.8\cdot 10^{-7}$ for $n=128$. The secondarily-endorsed [AIS31V1] A proposal for: Functionality classes and evaluation methodology for true (physical) random number generators, version 3.1, example E.6, gives the same false-error rate for $n=80$. Both my attempted exact computation and Monte-Carlo simulation suggest that the value of the false-error rate is correct in [AIS31V1] only, and the justification given (approximation by the $\chi^2$ distribution, which would give a false-error rate of $3.4\cdot 10^{-8}$) unusable to derive the correct value.
I'm thus asking how to directly derive the false-error rate for this test, preferably with an authoritative reference; then, in hope to explain a much higher error rate observed in practice, the expected effect on the false-error rate of a slight bias in the source bits (e.g. if the bits are assumed independent with mean $0.5+\epsilon$).
Update: I understand why the approximation by a $\chi^2$ distribution does not work; how I can make Monte-Carlo simulations; and how in principle I can calculate exactly the odds that the test fails (for $\epsilon=0$, my C code counting exact odds of each possible value of the test result is usable for $n$ multiple of $16$ up to $160$, giving results not contradicted by simulations). Problems are I'd like references; and this exact approach hits a computational wall for $\epsilon\ne0$.
This shows my tentative results for false-error rate (for $\epsilon=0$) as a function of the threshold, for different $n$ and per the $\chi^2$ distribution approximation.
