2

For performance reasons I often use numba and for my code I need to take a random sample without replacement. I found, that I could use the numpy.random function for that, but I noticed that it is extremely slow compared to the random.sample function. Am I doing something wrong? How could I improve the performance for the numba function? I boiled down my code to this minimal example:

import numpy as np
import numba as nb

def func2():
    List = range(100000)
    for x in range(20000):
        random.sample(List, 10)

@nb.njit()
def func3():
    Array = np.arange(100000)
    for x in range(20000):
        np.random.choice(Array, 10, False)

print(timeit(lambda: func2(), number=1))
print(timeit(lambda: func3(), number=1))
>>>0.1196
>>>20.1245

Edit: I'm now using my own sample function, which is much faster than np.random.choice.

@nb.njit()
def func4():
    for x in range(20000):
        rangeList = list(range(100000))
        result = []
        for x in range(10):
            randint = random.randint(0, len(rangeList) - 1)
            result.append(rangeList.pop(randint))
        return result
print(timeit(lambda: func4(), number=count))
>>>0.1767
HighwayJohn
  • 807
  • 1
  • 6
  • 20
  • The perfomance is decreased by `replace=False`, try to set it True – Alexander Riedel Nov 19 '20 at 21:46
  • Yeah, but I want to have it without replacement! – HighwayJohn Nov 19 '20 at 21:51
  • https://stackoverflow.com/questions/40914862/why-is-random-sample-faster-than-numpys-random-choice – Alexander Riedel Nov 19 '20 at 22:05
  • Thats not really an answer to my question. – HighwayJohn Nov 19 '20 at 22:08
  • Well it explains why random sample is sometimes faster than numpys random choice. And also how to overcome this a bit by using `np.random.default_rng().choice`. You are asking if you're doing something wrong, and the answer to this is: depends – Alexander Riedel Nov 19 '20 at 22:11
  • Numba is used to optimize the performance of non-vectorized code, `np.random.choice` is a vectorized function that will not really be speeded up, but actually be slower using jit. You can easily check this by comparing the jitted version to a non-jitted (which is about three times slower) if you include compilation time, excluding compilation time it's twice as fast.. – Alexander Riedel Nov 19 '20 at 22:24
  • I did all that. I'm surprised that my speed difference is so much larger and what I wonder what I could do about it. random.default_rng() is still much slower and has no numba support.Since this is part of a larger numba function I'm trying to optimize it for numba. – HighwayJohn Nov 20 '20 at 08:40

1 Answers1

0

Because I did some time measurements, I want to show you the results (regarding my comments on your question)

import numpy as np
from timeit import timeit
import numba as nb
import random

def func2():
    List = range(100000)
    for x in range(1000):
        random.sample(List, 10)

@nb.njit()
def func3():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.choice(Array, 10, replace=False)

def func4():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.choice(Array, 10, replace=False)

def func5():
    Array = np.arange(100000)
    for x in range(1000):
        np.random.default_rng().choice(Array, 10, replace=False)

print(f"random.sample {timeit(lambda: func2(), number=1)}")
print(f"np.random.choice JIT incl. compiling {timeit(lambda: func3(), number=1)}")
print(f"np.random.choice JIT excl. compiling {timeit(lambda: func3(), number=1)}")
print(f"np.random.choice {timeit(lambda: func4(), number=1)}")
print(f"np.random.default_rng.choice {timeit(lambda: func5(), number=1)}")

Giving you:

random.sample 0.0090606
np.random.choice JIT incl. compiling 1.9129443
np.random.choice JIT excl. compiling 0.8365084999999999
np.random.choice 1.8339632999999997
np.random.default_rng.choice 0.049018499999999854
Alexander Riedel
  • 1,134
  • 1
  • 5
  • 13
  • I noticed that as well but if I'm not doing anything wrong, is there a way to improve the performance of my numba function? – HighwayJohn Nov 20 '20 at 08:43
  • I think it really depends on your real use case to optimize this.. – Alexander Riedel Nov 20 '20 at 08:46
  • If I dont use numba the remaining part of my code would be slow... – HighwayJohn Nov 20 '20 at 08:47
  • yes, so you're using `np.random.choice` for that exact problem as above? or is there something more? – Alexander Riedel Nov 20 '20 at 08:48
  • 1
    Currently my code works differently but in principle beeing able to use np.random.choice / random.sample efficiently, should speed up my code significantly. I guess I will now take a look at how random.sample is coded and then check whether I can change that function to work with numba. – HighwayJohn Nov 20 '20 at 08:51