0

I am new to understanding pseudorandom number generators and sometimes feel daunted by them. For some time I have used the following approach in my work:

In one file, I'd have

import numpy as np
np.random.seed(1)
my_random_process()

In another file, I'd have

import numpy as np
np.random.seed(2)
my_random_process()

and so on. In the nth and final file, I'd have

import numpy as np
np.random.seed(n)
my_random_process()

In my head, each of these processes were "independent" in the sense that if I averaged my outcomes over all the file's outputs, I would have an estimate for the mean of my outputs with an error decreasing as $c/\sqrt{n}$ for a constant $c$. My aim was merely to find the average of the outputs of my_random_process, which depends on np.random functions.


However, I've been reading the documentation and I fear that the method above actually has correlations. In particular, numpy's parallel random sampling documentation says that np.random uses MT19337 for which "two adjacent 32-bit integer seeds (i.e. 12345 and 12346) would produce very similar streams". This makes me very sad, because correlations would inflate the value of the constant $c$.

On the other hand, numpy has other methods of generating random numbers, one of which I'm not too familiar with, np.random.default_rng. On another page of the random sampling documentation, there's the sentence "Seeds can be passed to any of the BitGenerators. The provided value is mixed via SeedSequence to spread a possible sequence of seeds across a wider range of initialization states for the BitGenerator."

Does this mean that if I do

import numpy as np
rng = np.random.default_rng(1)
my_new_random_process(rng)

and

import numpy as np
rng = np.random.default_rng(2)
my_new_random_process(rng)

and so on, where I've replaced np.random with rng.random (and so on) in my_random_process, will the random streams be largely uncorrelated? Or does one still need to be a little more thoughtful to avoid correlations? This is not a large problem for me, since I just want to calculate means, but if a small change in my workflow could help reduce the number of iterations I have to do, I'd be happy.

user196574
  • 133
  • 4
  • 2
    The standard techniques for partitioning the output of a single PRNG into uncorrelated streams of pseudo random operands are leap-frogging and blocking (called block splitting by some). This generally involves a way of advancing PRNG state by N steps (N > 1) in one go. This is computationally expensive for some PRNGs and simpler for others (trivial for counter-based PRNGs like Philox). Achieving uncorrelated streams by PRNG seeding is usually not possible, but depending on PRNG and use case may achieve results that are "good enough". – njuffa Apr 29 '22 at 01:01
  • 2
    Probably of interest: Makoto Matsumoto, et al., "Common Defects in Initialization of Pseudorandom Number Generators," ACM Transactions on Modeling and Computer Simulation, Vol. 17, No. 4, Article 15, Sept. 2007, 20 pp. – njuffa Apr 29 '22 at 07:53
  • @njuffa Thank you for the comments. The paper you sent notes that even in 2002 there was a decent initializer for the Mersenne Twister that "seems to solve these problems", so I strongly suspect that my former less-than-best practice of sequential seeds is not terrible. I'm also growing more hopeful that sequential seeds are indeed mapped to far apart initial conditions via seedsequence behind the scenes when np.random.default_rng(seed) and np.random.default_rng(seed+1) are called, but I'm still not 100% sure of that. – user196574 Apr 29 '22 at 16:36
  • 1
    One approach I have used in the past is to run seed values through a simple but effective hash first. In combination with a long-period PRNG the chance of "collisions" is small, and multiple streams are "likely minimally correlated", which may be enough for some use cases. However, with the advent of counter-based PRNGs parallelization with proper blocking has become trivial, so that seems like the best way forward. Does your software environment offer a counter-based PRNG? – njuffa Apr 29 '22 at 22:12
  • @njuffa, would you like to expand your comments into an answer? – nicoguaro May 01 '22 at 15:23

0 Answers0