I've used MT19937 in a test harness to generate uniformly (unsigned) 32-bit [0, $2^{32}$- 1] values, based on the original Authors' mt19937.c implementation, to generate an (essentially inexhaustible) supply of statistically random stream of bit-octets. This is in lieu of a CSPRNG, which isn't necessary for these particular tests. However, I've recently been considering the WELL PRNGs - not because of the statistical properties as such (both seem more than adequate for my needs) - but they seem to suggest more a efficient implementation.
I lack the mathematical background for the academic papers, though I could at least follow the 'twist' matrix and tempering transform for the former. However, much of the code provided by the Authors of the WELL-n functions seem to focus of floating-point generation, with some magic floating point constants (e.g., 2.32830643653869628906e-10). Can steps be omitted from the WELL code to provide a uniform 32-bit distribution? Or is the algorithm designed / biased specifically for floating-point distributions?
Or I'm an incorrect in thinking that WELL will yield a performance gain for 'bulk' uint32 vector generation, while satisfying my requirements?
PCGRNGs. They are much faster, can supply very long periods (as well as multiple streams) with statistically excellent properties, and appear to recover very quickly from poor IVs. – Brett Hale Jul 14 '17 at 07:56