Random sample in numba

Question

For performance reasons I often use numba and for my code I need to take a random sample without replacement. I found, that I could use the numpy.random function for that, but I noticed that it is extremely slow compared to the random.sample function. Am I doing something wrong? How could I improve the performance for the numba function? I boiled down my code to this minimal example:

import numpy as np
import numba as nb

def func2():
    List = range(100000)
    for x in range(20000):
        random.sample(List, 10)

@nb.njit()
def func3():
    Array = np.arange(100000)
    for x in range(20000):
        np.random.choice(Array, 10, False)

print(timeit(lambda: func2(), number=1))
print(timeit(lambda: func3(), number=1))
>>>0.1196
>>>20.1245

Edit: I'm now using my own sample function, which is much faster than np.random.choice.

@nb.njit()
def func4():
    for x in range(20000):
        rangeList = list(range(100000))
        result = []
        for x in range(10):
            randint = random.randint(0, len(rangeList) - 1)
            result.append(rangeList.pop(randint))
        return result
print(timeit(lambda: func4(), number=count))
>>>0.1767

The perfomance is decreased by `replace=False`, try to set it True — Alexander Riedel, Nov 19 '20 at 21:46
https://stackoverflow.com/questions/40914862/why-is-random-sample-faster-than-numpys-random-choice — Alexander Riedel, Nov 19 '20 at 22:05
Well it explains why random sample is sometimes faster than numpys random choice. And also how to overcome this a bit by using `np.random.default_rng().choice`. You are asking if you're doing something wrong, and the answer to this is: depends — Alexander Riedel, Nov 19 '20 at 22:11
Numba is used to optimize the performance of non-vectorized code, `np.random.choice` is a vectorized function that will not really be speeded up, but actually be slower using jit. You can easily check this by comparing the jitted version to a non-jitted (which is about three times slower) if you include compilation time, excluding compilation time it's twice as fast.. — Alexander Riedel, Nov 19 '20 at 22:24
I did all that. I'm surprised that my speed difference is so much larger and what I wonder what I could do about it. random.default_rng() is still much slower and has no numba support.Since this is part of a larger numba function I'm trying to optimize it for numba. — HighwayJohn, Nov 20 '20 at 08:40

score 0 · Answer 1 · answered Nov 19 '20 at 22:29

Because I did some time measurements, I want to show you the results (regarding my comments on your question)

import numpy as np
from timeit import timeit
import numba as nb
import random

def func2():
    List = range(100000)
    for x in range(1000):
        random.sample(List, 10)

@nb.njit()
def func3():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.choice(Array, 10, replace=False)

def func4():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.choice(Array, 10, replace=False)

def func5():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.default_rng().choice(Array, 10, replace=False)

print(f"random.sample {timeit(lambda: func2(), number=1)}")
print(f"np.random.choice JIT incl. compiling {timeit(lambda: func3(), number=1)}")
print(f"np.random.choice JIT excl. compiling {timeit(lambda: func3(), number=1)}")
print(f"np.random.choice {timeit(lambda: func4(), number=1)}")
print(f"np.random.default_rng.choice {timeit(lambda: func5(), number=1)}")

Giving you:

random.sample 0.0090606
np.random.choice JIT incl. compiling 1.9129443
np.random.choice JIT excl. compiling 0.8365084999999999
np.random.choice 1.8339632999999997
np.random.default_rng.choice 0.049018499999999854

I noticed that as well but if I'm not doing anything wrong, is there a way to improve the performance of my numba function? — HighwayJohn, Nov 20 '20 at 08:43
I think it really depends on your real use case to optimize this.. — Alexander Riedel, Nov 20 '20 at 08:46
If I dont use numba the remaining part of my code would be slow... — HighwayJohn, Nov 20 '20 at 08:47
yes, so you're using `np.random.choice` for that exact problem as above? or is there something more? — Alexander Riedel, Nov 20 '20 at 08:48
Currently my code works differently but in principle beeing able to use np.random.choice / random.sample efficiently, should speed up my code significantly. I guess I will now take a look at how random.sample is coded and then check whether I can change that function to work with numba. — HighwayJohn, Nov 20 '20 at 08:51

Random sample in numba

1 Answers1