74

I have the following function:

def copy_file(source_file, target_dir):
    pass

Now I would like to use multiprocessing to execute this function at once:

p = Pool(12)
p.map(lambda x: copy_file(x,target_dir), file_list)

The problem is, lambda's can't be pickled, so this fails. What is the most neat (pythonic) way to fix this?

Nicolás Ozimica
  • 9,062
  • 5
  • 36
  • 49
Peter Smit
  • 26,186
  • 31
  • 108
  • 168

4 Answers4

73

Use a function object:

class Copier(object):
    def __init__(self, tgtdir):
        self.target_dir = tgtdir
    def __call__(self, src):
        copy_file(src, self.target_dir)

To run your Pool.map:

p.map(Copier(target_dir), file_list)
Fred Foo
  • 342,876
  • 71
  • 713
  • 819
65

For Python2.7+ or Python3, you could use functools.partial:

import functools
copier = functools.partial(copy_file, target_dir=target_dir)
p.map(copier, file_list)
unutbu
  • 777,569
  • 165
  • 1,697
  • 1,613
  • 1
    This one even looks cleaner... I will decide later which one to make my answer – Peter Smit Jan 28 '11 at 13:30
  • @Peter Smit: Oops -- you saw my post before I deleted it... I'm undeleting this just to announce it doesn't work due to a bug in Python2. – unutbu Jan 28 '11 at 13:38
  • Ah, ok. I was looking in the docs and didn't see a comment about the pickability, so I assumed they were... I'll accept the other answer then, as that seems to be the way to go. – Peter Smit Jan 28 '11 at 13:52
  • 1
    Still, a +1 for this answer since it's shorter (in Python 3, that is ;) – Fred Foo Jan 28 '11 at 14:47
  • 9
    Landing here much later, as an update `functools.partial` is also picklable in python 2.7. – pythonic metaphor Jul 18 '13 at 22:42
  • This solution has the problem that only the second and higher arguments can be "curried". The function object solution is much more flexible. – Igor Rivin Jul 21 '18 at 00:18
  • 1
    Used this to fix a parallel search for non-isomorphic graphs. It runs 15x faster than the Fred Foo's solution – Alice Schwarze Oct 23 '19 at 02:27
11

Question is a bit old but if you are still use Python 2 my answer can be useful.

Trick is to use part of pathos project: multiprocess fork of multiprocessing. It get rid of annoying limitation of original multiprocess.

Installation: pip install multiprocess

Usage:

>>> from multiprocess import Pool
>>> p = Pool(4)
>>> print p.map(lambda x: (lambda y:y**2)(x) + x, xrange(10))
[0, 2, 6, 12, 20, 30, 42, 56, 72, 90]
petRUShka
  • 9,400
  • 12
  • 57
  • 91
1

From this answer, pathos let's you run your lambda p.map(lambda x: copy_file(x,target_dir), file_list) directly, saving all the workarounds / hacks

Community
  • 1
  • 1
Rufus
  • 4,375
  • 3
  • 24
  • 38