Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


import os
from multiprocessing import Pool

def task(args):
    print "PID =", os.getpid(), ", arg1 =", args[0], ", arg2 =", args[1]

pool = Pool()

pool.map(task, [
        [1,2],
        [3,4],
        [5,6],
        [7,8]
    ])

Another way is to pass a list of lists to a one-argument routine:

One can than construct a list lists of arguments with one's favorite method.

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


AttributeError: __exit__
import multiprocessing
from functools import partial
from contextlib import contextmanager

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(partial(merge_names, b='Sons'), names)
    print(results)

# Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ...
import multiprocessing
from itertools import product
from contextlib import contextmanager

def merge_names(a, b):
    return '{} & {}'.format(a, b)

def merge_names_unpack(args):
    return merge_names(*args)

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(merge_names_unpack, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...
pool = Pool();
pool.close()
with .. as ..

@muon, good catch. It appears Pool objects don't become context managers until Python 3.3. I've added a simple wrapper function that returns a Pool context manager.

For earlier versions of Python, you'll need to write a helper function to unpack the arguments explicitly. If you want to use with, you'll also need to write a wrapper to turn Pool into a context manager. (Thanks to muon for pointing this out.)

I'm confused, what happened to the text variable in your example? Why is RAW_DATASET seemingly passed twice. I think you might have a typo?

In simpler cases, with a fixed second argument, you can also use partial, but only in Python 2.7+.

It seems to me that RAW_DATASET in this case should be a global variable? While I want the partial_harvester change the value of case in every call of harvester(). How to achieve that?

The answer to this is version- and situation-dependent. The most general answer for recent versions of Python (since 3.3) was first described below by J.F. Sebastian.1 It uses the Pool.starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function:

The most important thing here is assigning =RAW_DATASET default value to case. Otherwise pool.map will confuse about the multiple arguments.

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


[3, 5, 7]
def multi_run_wrapper(args):
   return add(*args)
def add(x,y):
    return x+y
if __name__ == "__main__":
    from multiprocessing import Pool
    pool = Pool(4)
    results = pool.map(multi_run_wrapper,[(1,2),(2,3),(3,4)])
    print results

Easiest solution. There is a small optimization; remove the wrapper function and unpack args directly in add, it works for any number of arguments: def add(args): (x,y) = args

NIce solution. Really helpful with list args

hm... in fact, using a lambda does not work because pool.map(..) tries to pickle the given function

you could also use a lambda function instead of defining multi_run_wrapper(..)

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


[3, 5, 7]
def multi_run_wrapper(args):
   return add(*args)
def add(x,y):
    return x+y
if __name__ == "__main__":
    from multiprocessing import Pool
    pool = Pool(4)
    results = pool.map(multi_run_wrapper,[(1,2),(2,3),(3,4)])
    print results

Easiest solution. There is a small optimization; remove the wrapper function and unpack args directly in add, it works for any number of arguments: def add(args): (x,y) = args

NIce solution. Really helpful with list args

hm... in fact, using a lambda does not work because pool.map(..) tries to pickle the given function

you could also use a lambda function instead of defining multi_run_wrapper(..)

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> def func(a,b):
  ...     print a,b
  ...
  >>>
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>> pool.map(func, [1,2,3], [1,1,1])
  1 1
  2 1
  3 1
  [None, None, None]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]

There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. Pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


AttributeError: __exit__
import multiprocessing
from functools import partial
from contextlib import contextmanager

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(partial(merge_names, b='Sons'), names)
    print(results)

# Output: ['Brown & Sons', 'Wilson & Sons', 'Bartlett & Sons', ...
import multiprocessing
from itertools import product
from contextlib import contextmanager

def merge_names(a, b):
    return '{} & {}'.format(a, b)

def merge_names_unpack(args):
    return merge_names(*args)

@contextmanager
def poolcontext(*args, **kwargs):
    pool = multiprocessing.Pool(*args, **kwargs)
    yield pool
    pool.terminate()

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with poolcontext(processes=3) as pool:
        results = pool.map(merge_names_unpack, product(names, repeat=2))
    print(results)

# Output: ['Brown & Brown', 'Brown & Wilson', 'Brown & Bartlett', ...
pool = Pool();
pool.close()
with .. as ..

@muon, good catch. It appears Pool objects don't become context managers until Python 3.3. I've added a simple wrapper function that returns a Pool context manager.

For earlier versions of Python, you'll need to write a helper function to unpack the arguments explicitly. If you want to use with, you'll also need to write a wrapper to turn Pool into a context manager. (Thanks to muon for pointing this out.)

I'm confused, what happened to the text variable in your example? Why is RAW_DATASET seemingly passed twice. I think you might have a typo?

In simpler cases, with a fixed second argument, you can also use partial, but only in Python 2.7+.

It seems to me that RAW_DATASET in this case should be a global variable? While I want the partial_harvester change the value of case in every call of harvester(). How to achieve that?

The answer to this is version- and situation-dependent. The most general answer for recent versions of Python (since 3.3) was first described below by J.F. Sebastian.1 It uses the Pool.starmap method, which accepts a sequence of argument tuples. It then automatically unpacks the arguments from each tuple and passes them to the given function:

The most important thing here is assigning =RAW_DATASET default value to case. Otherwise pool.map will confuse about the multiple arguments.

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


import os
from multiprocessing import Pool

def task(args):
    print "PID =", os.getpid(), ", arg1 =", args[0], ", arg2 =", args[1]

pool = Pool()

pool.map(task, [
        [1,2],
        [3,4],
        [5,6],
        [7,8]
    ])

Another way is to pass a list of lists to a one-argument routine:

I will say this sticks to Python zen. There should be one and only one obvious way to do it. If by chance you are the author of the calling function, this you should use this method, for other cases we can use imotai's method.

My choice is to use a tuple, And then immediately unwrap them as the first thing in the first line.

One can than construct a list lists of arguments with one's favorite method.

This is an easy way, but you need to change your original functions. What's more, some time recall others' functions which may can't be modified.

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


#!/usr/bin/env python2
import itertools
from multiprocessing import Pool, freeze_support

def func(a, b):
    print a, b

def func_star(a_b):
    """Convert `f([1,2])` to `f(1,2)` call."""
    return func(*a_b)

def main():
    pool = Pool()
    a_args = [1,2,3]
    second_arg = 1
    pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))

if __name__=="__main__":
    freeze_support()
    main()
#!/usr/bin/env python3
from functools import partial
from itertools import repeat
from multiprocessing import Pool, freeze_support

def func(a, b):
    return a + b

def main():
    a_args = [1,2,3]
    second_arg = 1
    with Pool() as pool:
        L = pool.starmap(func, [(1, 1), (2, 1), (3, 1)])
        M = pool.starmap(func, zip(a_args, repeat(second_arg)))
        N = pool.map(partial(func, b=second_arg), a_args)
        assert L == M == N

if __name__=="__main__":
    freeze_support()
    main()
1 1
2 1
3 1
pool.starmap()

@Space_C0wb0y: f((a,b)) syntax is deprecated and removed in py3k. And it is unnecessary here.

@zthomas.nc this question is about how to support multiple arguments for multiprocessing pool.map. If want to know how to call a method instead of a function in a different Python process via multiprocessing then ask a separate question (if all else fails, you could always create a global function that wraps the method call similar to func_star() above)

Due to the bug mentioned by @unutbu you can't use functools.partial() or similar capabilities on Python 2.6, so the simple wrapper function func_star() should be defined explicitly. See also the workaround suggested by uptimebox.

F.: I did not know that it was deprecated, thanks!

F.: You can unpack the argument tuple in the signature of func_star like this: def func_star((a, b)). Of course, this only works for a fixed number of arguments, but if that is the only case he has, it is more readable.

Notice how itertools.izip() and itertools.repeat() are used here.

is there a variant of pool.map which support multiple arguments?

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


#!/usr/bin/env python2
import itertools
from multiprocessing import Pool, freeze_support

def func(a, b):
    print a, b

def func_star(a_b):
    """Convert `f([1,2])` to `f(1,2)` call."""
    return func(*a_b)

def main():
    pool = Pool()
    a_args = [1,2,3]
    second_arg = 1
    pool.map(func_star, itertools.izip(a_args, itertools.repeat(second_arg)))

if __name__=="__main__":
    freeze_support()
    main()
#!/usr/bin/env python3
from functools import partial
from itertools import repeat
from multiprocessing import Pool, freeze_support

def func(a, b):
    return a + b

def main():
    a_args = [1,2,3]
    second_arg = 1
    with Pool() as pool:
        L = pool.starmap(func, [(1, 1), (2, 1), (3, 1)])
        M = pool.starmap(func, zip(a_args, repeat(second_arg)))
        N = pool.map(partial(func, b=second_arg), a_args)
        assert L == M == N

if __name__=="__main__":
    freeze_support()
    main()
1 1
2 1
3 1
pool.starmap()

@Space_C0wb0y: f((a,b)) syntax is deprecated and removed in py3k. And it is unnecessary here.

@zthomas.nc this question is about how to support multiple arguments for multiprocessing pool.map. If want to know how to call a method instead of a function in a different Python process via multiprocessing then ask a separate question (if all else fails, you could always create a global function that wraps the method call similar to func_star() above)

Due to the bug mentioned by @unutbu you can't use functools.partial() or similar capabilities on Python 2.6, so the simple wrapper function func_star() should be defined explicitly. See also the workaround suggested by uptimebox.

F.: I did not know that it was deprecated, thanks!

F.: You can unpack the argument tuple in the signature of func_star like this: def func_star((a, b)). Of course, this only works for a fixed number of arguments, but if that is the only case he has, it is more readable.

Notice how itertools.izip() and itertools.repeat() are used here.

is there a variant of pool.map which support multiple arguments?

Note
Rectangle 27 0

Python multiprocessing pool.map for multiple arguments?


1 --- 4
2 --- 5
3 --- 6
from multiprocessing.dummy import Pool as ThreadPool 

def write(i, x):
    print(i, "---", x)

a = ["1","2","3"]
b = ["4","5","6"] 

pool = ThreadPool(2)
pool.starmap(write, zip(a,b)) 
pool.close() 
pool.join()
pool.starmap():

In case you want to have a constant value passed as an argument you have to use import itertools and then zip(itertools.repeat(constant), a) for example.

No. First of all it removed lots of unnecessary stuff and clearly states it's for python 3.3+ and is intended for beginners that look for a simple and clean answer. As a beginner myself it took some time to figure it out that way (yes with JFSebastians posts) and this is why I wrote my post to help other beginners, because his post simply said "there is starmap" but did not explain it - this is what my post intends. So there is absolutely no reason to bash me with two downvotes.

This is a near exact duplicate answer as the one from @J.F.Sebastian in 2011 (with 60+ votes).

You can also zip() more arguments if you like: zip(a,b,c,d,e)

Note