Use Multiprocessing when developing a python package

Question

I'm developing a python package, in which multiprocessing package will be used. However, when run it on windows machines, I met a problem. The demo code is shown as follows.

structure of the demo

|pkg
|  __init__.py
|  pkg_script.py
|test.py

pkg is the package folder, test.py is a python script that uses the package.

init.py

from .pkg_script import *

pkg_script.py

import multiprocessing as mp

def task(task_id):
    print(f'task {task_id}')

def func():
    p = mp.Pool(4)
    p.map(task,range(4))

test.py

import pkg

pkg.func()

The error message I got:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The reason is that, on windows machines, each sub processes will run scripts from the beginning. Therefore, pkg.func() in the test.py will also be conducted in each sub processes. To avoid this issue, we can put pkg.func() in the test.py under if __name__ == '__main__'. However, as I'm developing a package, I cannot enforce users to do so. Users may put pkg.func() anywhere in their codes. Is there any solutions for that without using if __name__ == '__main__'?

EDIT1: please do not suggest putting pkg.func() in the test.py under if __name__ == '__main__'. I know it will work. I'm finding a solution without using if __name__ == '__main__'.

EDIT2: I'm the developer of pkg package, not the user. So, I would like to resolve that issue by revising pkg package so that users of pkg package will have no restriction when using pkg.

Does this answer your question? [How to use multiprocessing.Pool in an imported module?](https://stackoverflow.com/questions/42602584/how-to-use-multiprocessing-pool-in-an-imported-module) — python_user, May 22 '21 at 02:28
Does this answer your question? [RuntimeError on windows trying python multiprocessing](https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing). The problem is not with your imported modules but rather your test program. You need *test.py* to be `if __name__ == '__main__': pkg.func()` — Booboo, May 22 '21 at 11:14
@Booboo If you read the last three sentences in my question, I know it will work with the guard of if `__name__ == '__main__'`. The issue here is I'm developing a python package, and I cannot anticipate where users will put `pkg.func()`. They may use that in Jupyter, where main function is seldom used. So, the problem is how to resolve that issue without using `if __name__ == '__main__'` — Jiawei Lu, May 22 '21 at 17:17
@JiaweiLu You only have control over your own test program, namely *test.py*. Anyone running on a platform that uses `spawn` to create new processes needs to know that they must include any process-creating code within a `if \_\_name__ == '\_\_main__': block or suffer the consequences. — Booboo, May 22 '21 at 17:36
@Booboo As I'm developing the `pkg` package, I'm wondering is there any solution that resolves the issue by revising the `pkg` package, instead of asking users to put commands under `if __name__ == '__main__'`, which will make the `pkg` package more user-friendly. Also, I think there are still many python users who do not know what is happening behind the `multiprocessing` module. They will get confused if get an error when using `pkg` package. — Jiawei Lu, May 22 '21 at 17:50
I have deleted my "quasi-answer" because your point was well-taken. — Booboo, May 22 '21 at 20:13

Use Multiprocessing when developing a python package

0 Answers0