14

I want to ask the same question as Python 3: does Pool keep the original order of data passed to map? for joblib. E.g.:

Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in x)

The syntax kind of implied it but I am always worried about the ordering of output of parallel processing and I don't want to code base on undocumented behavior.

user3226167
  • 2,599
  • 2
  • 29
  • 32

2 Answers2

26

TL;DR - it preserves order for both backends.

Extending @Chris Farr's answer, I implemented a simple test. I make a function wait for some random amount of time (you can check these wait times are not identical). I get that the order is preserved every time, with both backends.

from joblib import Parallel, delayed
import numpy as np
import time

def f(wait):
    time.sleep(wait)
    return wait

n = 50
waits = np.random.uniform(low=0, high=1, size=n)
res = Parallel(n_jobs=8, backend='multiprocessing')(delayed(f)(wait) for wait in waits)
np.all(res == waits)
Yair Daon
  • 943
  • 1
  • 14
  • 25
11

Per the joblib documentation you can specify the backend asmultiprocessing which is based on multiprocessing.Pool. Then the other answer would apply that the results are in fact ordered.

Parallel(n_jobs=2, backend="multiprocessing")(delayed(sqrt)(i ** 2) for i in x)

By default, however, they use loky and it isn't immediately clear but it could be detected by implementing tests.

Chris Farr
  • 3,221
  • 1
  • 20
  • 23