141

I am relatively new to Python and trying to implement a Multiprocessing module for my for loop.

I have an array of Image url's stored in img_urls which I need to download and apply some Google vision.

if __name__ == '__main__':

    img_urls = [ALL_MY_Image_URLS]
    runAll(img_urls)
    print("--- %s seconds ---" % (time.time() - start_time)) 

This is my runAll() method

def runAll(img_urls):
    num_cores = multiprocessing.cpu_count()

    print("Image URLS  {}",len(img_urls))
    if len(img_urls) > 2:
        numberOfImages = 0
    else:
        numberOfImages = 1

    start_timeProcess = time.time()

    pool = multiprocessing.Pool()
    pool.map(annotate,img_urls)
    end_timeProcess = time.time()
    print('\n Time to complete ', end_timeProcess-start_timeProcess)

    print(full_matching_pages)


def annotate(img_path):
    file =  requests.get(img_path).content
    print("file is",file)
    """Returns web annotations given the path to an image."""
    print('Process Working under ',os.getpid())
    image = types.Image(content=file)
    web_detection = vision_client.web_detection(image=image).web_detection
    report(web_detection)

I am getting this as the warning when I run it and python crashes

objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
hata
  • 10,652
  • 6
  • 37
  • 62
SriTeja Chilakamarri
  • 2,103
  • 4
  • 14
  • 28
  • 1
    Are you on OSX? Then perhaps [this bug report](https://github.com/ansible/ansible/issues/32499) gives you some hints. – IonicSolutions May 04 '18 at 06:41
  • Oh Yeah I am on OSX, thank you for pointing me to the link. – SriTeja Chilakamarri May 04 '18 at 06:44
  • Still no luck tried setting the `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` as mentioned, still get the same error. @IonicSolutions – SriTeja Chilakamarri May 04 '18 at 06:51
  • Unfortunately, I have no specific knowledge on this topic. All I can do is use Google to find related issues, e.g. [this possible workaround](https://bugs.python.org/issue30385#msg293958). – IonicSolutions May 04 '18 at 07:33
  • 3
    This is due to [Apple changing macOS `fork()` behavior since High Sierra](https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/). The `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=yes` variable turns off the immediate crash behavior that their newer ObjectiveC framework usually enforces now by default. This can affect any language that is doing multithreading / multiprocessing using `fork()` on macOS `>= 10.13`, especially when "native extensions" / C code extensions are used. – TrinitronX Jul 31 '20 at 01:35
  • There are also some [Python specific issues](https://codewithoutrules.com/2017/08/16/concurrency-python/) w.r.t. [multithreading & multiprocessing](https://pythonspeed.com/articles/python-multiprocessing/) that you might want to be aware of. It's common to run into deadlock and performance issues with Python threads due to the way Python is designed, specifically the ["GIL" / Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock) – TrinitronX Jul 31 '20 at 01:47
  • [Another good discussion](http://www.sealiesoftware.com/blog/archive/2017/6/5/Objective-C_and_fork_in_macOS_1013.html) of the issue, [thanks to Reddit user "`snatchery`"](https://www.reddit.com/r/ruby/comments/72qxjv/kernelfork_broken_in_macos_1013_high_sierra_puma/dnks820/) – TrinitronX Jul 31 '20 at 02:16
  • Also [a summary thanks to Reddit user "`Nwallins`"](https://www.reddit.com/r/ruby/comments/72qxjv/kernelfork_broken_in_macos_1013_high_sierra_puma/dnl8unj/) – TrinitronX Jul 31 '20 at 02:24

4 Answers4

322

This error occurs because of added security to restrict multithreading in macOS High Sierra and later versions of macOS. I know this answer is a bit late, but I solved the problem using the following method:

Set an environment variable .bash_profile (or .zshrc for recent macOS) to allow multithreading applications or scripts under the new macOS High Sierra security rules.

Open a terminal:

$ nano .bash_profile

Add the following line to the end of the file:

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

Save, exit, close terminal and re-open the terminal. Check to see that the environment variable is now set:

$ env

You will see output similar to:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/pn/vasdlj3ojO#OOas4dasdffJq/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.E7qLFJDSo/Render
TERM_PROGRAM_VERSION=404
TERM_SESSION_ID=NONE
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

You should now be able to run your Python script with multithreading.

raw-bin hood
  • 4,945
  • 5
  • 29
  • 39
11

the other answers are telling you to set OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES, but don't do this! you're just putting sticky tape on the warning light. You may need this on a case by case basis for some legacy software but certainly do not set this in your .bash_profile!

this is fixed in https://bugs.python.org/issue33725 (python3.8+) but it's best practice to use

with multiprocessing.get_context("spawn").Pool() as pool:
    pool.map(annotate,img_urls)
Thomas Grainger
  • 1,893
  • 24
  • 33
8

Running MAC and z-shell and in my .zshrc-file I had to add:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

and then in the command line:

source ~/.zshrc

Then it worked

Brainmaniac
  • 2,115
  • 2
  • 22
  • 43
1

The solution that works for me without OBJC_DISABLE_INITIALIZE_FORK_SAFETY flag in the environment involves initializing the multiprocessing.Pool class right after the main() program starts.

This is most likely not the fastest solution possible and I am not sure if it works in all situations, however, pre-heating the worker processes early enough before my programs starts does not result in any ... may have been in progress in another thread when fork() was called errors and I do get a significant performance boost compared to what I get with non-parallelized code.

I have created a convenience class Parallelizer which I am starting very early and then using throughout the lifecycle of my program. The full version can be found here.

# entry point to my program
def main():
    parallelizer = Parallelizer()
    ...

Then whenever you want to have parallelization:

# this function is parallelized. it is run by each child process.
def processing_function(input):
    ...
    return output

...
inputs = [...]
results = parallelizer.map(
    inputs,
    processing_function
)

And the parallelizer class:

class Parallelizer:
    def __init__(self):
        self.input_queue = multiprocessing.Queue()
        self.output_queue = multiprocessing.Queue()
        self.pool = multiprocessing.Pool(multiprocessing.cpu_count(),
                                         Parallelizer._run,
                                         (self.input_queue, self.output_queue,))

    def map(self, contents, processing_func):
        size = 0
        for content in contents:
            self.input_queue.put((content, processing_func))
            size += 1
        results = []
        while size > 0:
            result = self.output_queue.get(block=True)
            results.append(result)
            size -= 1
        return results

    @staticmethod
    def _run(input_queue, output_queue):
        while True:
            content, processing_func = input_queue.get(block=True)
            result = processing_func(content)
            output_queue.put(result)

One caveat: the parallelized code might be difficult to debug so I have also prepared a non-parallelizing version of my class which I enable when something goes wrong in the child processes:

class NullParallelizer:
    @staticmethod
    def map(contents, processing_func):
        results = []
        for content in contents:
            results.append(processing_func(content))
        return results
Stanislav Pankevich
  • 10,408
  • 6
  • 64
  • 118