0

I try to learn parallel computer in python using multiprocessing for numerical calculations. In order to share numpy.array, I found following example from https://research.wmz.ninja/articles/2018/03/on-sharing-large-arrays-when-using-pythons-multiprocessing.html

'''

import numpy as np
import time
from multiprocessing import Pool, RawArray

# A global dictionary storing the variables passed from the initializer.
var_dict = {}

def init_worker(X, X_shape):
    # Using a dictionary is not strictly necessary. You can also
    # use global variables.
    var_dict['X'] = X
    var_dict['X_shape'] = X_shape

def worker_func(i):
    # Simply computes the sum of the i-th row of the input matrix X
    X_np = np.frombuffer(var_dict['X']).reshape(var_dict['X_shape'])
    time.sleep(1) # Some heavy computations
    return np.asscalar(np.sum(X_np[i,:]))

# We need this check for Windows to prevent infinitely spawning new child
# processes.
if __name__ == '__main__':
    X_shape = (16, 1000000)
    # Randomly generate some data
    data = np.random.randn(*X_shape)
    X = RawArray('d', X_shape[0] * X_shape[1])
    # Wrap X as an numpy array so we can easily manipulates its data.
    X_np = np.frombuffer(X).reshape(X_shape)
    # Copy data to our shared array.
    np.copyto(X_np, data)
    # Start the process pool and do the computation.
    # Here we pass X and X_shape to the initializer of each worker.
    # (Because X_shape is not a shared variable, it will be copied to each
    # child process.)
    with Pool(processes=4, initializer=init_worker, initargs=(X, X_shape)) as pool:
        result = pool.map(worker_func, range(X_shape[0]))
        print('Results (pool):\n', np.array(result))
    # Should print the same results.
    print('Results (numpy):\n', np.sum(X_np, 1))

'''

The code does not run on windowns10/jupyterlab. The kernel in the jupyterlab keeps busy, but no print out. Then I found this link: Jupyter notebook never finishes processing using multiprocessing (Python 3), it tells me that I need to code the function (def worker_func(i):) into a file known as z_worder_func.py. I did that with:

'''

import numpy as np
import time
from multiprocessing import Pool, RawArray
import z_worker_func

var_dict = {}

def init_worker(X, X_shape):
    var_dict['X'] = X
    var_dict['X_shape'] = X_shape

# def worker_func(i):
#     # Simply computes the sum of the i-th row of the input matrix X
#     X_np = np.frombuffer(var_dict['X']).reshape(var_dict['X_shape'])
#     time.sleep(1) # Some heavy computations
#     return np.asscalar(np.sum(X_np[i,:]))

def main():
    X_shape = (16, 10000)
    data = np.random.randn(*X_shape)
    X = RawArray('d', X_shape[0] * X_shape[1])
    X_np = np.frombuffer(X).reshape(X_shape)
    np.copyto(X_np, data)
    with Pool(processes=2, initializer=init_worker, initargs=(X, X_shape)) as pool:
        result = pool.map(z_worker_func.worker_func, range(X_shape[0]))
        print('Results (pool):\n', np.array(result))
    print('Results (numpy):\n', np.sum(X_np, 1))

if __name__ == '__main__':
    main()

'''

with a z_worker_func.py. But this still does not work. I can see CUP busy from Windows Task Manager, but no print out. Can someone help me, basically, I am looking for how numpy/scipy with python parallel computing for numerical/multi-dimension-array calculations for digital signal processing. Many thanks in advance!

Wu-O
  • 21
  • 2

0 Answers0