Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>>
  >>> def func(g,h,i):
  ...   return g+h+i
  ... 
  >>> p.map(func, [1,2,3],[4,5,6],[7,8,9])
  [12, 15, 18]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]

There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

Note
Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>>
  >>> def func(g,h,i):
  ...   return g+h+i
  ... 
  >>> p.map(func, [1,2,3],[4,5,6],[7,8,9])
  [12, 15, 18]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]

There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

Note
Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


import itertools
from multiprocessing import Pool

def func(g, h, i):
    return g + h + i

def helper(args):
    args2 = args[0] + (args[1],)
    return func(*args2)

def main():
    pool = Pool(processes=4)
    result = pool.map(helper,itertools.izip(itertools.repeat((2, 3)), range(10)))
    print result

if __name__ == '__main__':
    main()

Ah, right: now I remember; I've run into this myselfthis problem only happens on Windows, also with "normal" Python :) But even on OS X/Linux, it's still semantically correct to not put top-level code into modules that are about to be imported; which is implicitly the case when using multiprocessing.

Based on the answer from @ErikAllik I'm thinking that this might be a Windows-specific problem.

The problem is solved by adding a main() function as:

Note
Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


Python 2.7.5 (default, Sep 30 2013, 20:15:49) 
  [GCC 4.2.1 (Apple Inc. build 5566)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from pathos.multiprocessing import ProcessingPool    
  >>> pool = ProcessingPool(nodes=4)
  >>>
  >>> def func(g,h,i):
  ...   return g+h+i
  ... 
  >>> p.map(func, [1,2,3],[4,5,6],[7,8,9])
  [12, 15, 18]
  >>>
  >>> # also can pickle stuff like lambdas 
  >>> result = pool.map(lambda x: x**2, range(10))
  >>> result
  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
  >>>
  >>> # also does asynchronous map
  >>> result = pool.amap(pow, [1,2,3], [4,5,6])
  >>> result.get()
  [1, 32, 729]
  >>>
  >>> # or can return a map iterator
  >>> result = pool.imap(pow, [1,2,3], [4,5,6])
  >>> result
  <processing.pool.IMapIterator object at 0x110c2ffd0>
  >>> list(result)
  [1, 32, 729]

There's a fork of multiprocessing called pathos (note: use the version on github) that doesn't need starmap or helpers or all of that other stuff -- the map functions mirror the API for python's map, thus map can take multiple arguments. With pathos, you can also generally do multiprocessing in the interpreter, instead of being stuck in the __main__ block. pathos is due for a release, after some mild updating -- mostly conversion to python 3.x.

Note
Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


import itertools
from multiprocessing import Pool

def func(g, h, i):
    return g + h + i

def helper(args):
    args2 = args[0] + (args[1],)
    return func(*args2)

def main():
    pool = Pool(processes=4)
    result = pool.map(helper,itertools.izip(itertools.repeat((2, 3)), range(10)))
    print result

if __name__ == '__main__':
    main()

Ah, right: now I remember; I've run into this myselfthis problem only happens on Windows, also with "normal" Python :) But even on OS X/Linux, it's still semantically correct to not put top-level code into modules that are about to be imported; which is implicitly the case when using multiprocessing.

Based on the answer from @ErikAllik I'm thinking that this might be a Windows-specific problem.

The problem is solved by adding a main() function as:

Note
Rectangle 27 0

Multiprocessing with multiple arguments to function in Python 2.7?


[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]

Explanation: The problem happens (only on Windows for some reason, but could as well be happening on OS X and Linux) because your module contains top-level code. What multiprocessing does is that it imports your code in the subprocess and executes it. However, if your module contains top-level code, it will be evaluated/executed immediately as the module gets imported. Wrapping it in main and only calling main() conditionally (i.e. with a if __name__ == '__main__' block), you're preventing this from happening. Also, this is more correct on OS X and Linux, and is generally always preferred over putting code right in the module.

I can see your Python paths contain EPD_python27, so maybe try using a vanila Python distribution, not Enthought Python Distribution.

Ok, thanks. Since you got it to work I tried something different and main(), which seems to solve the problem. I initially ran the program through Ipython, but with a main it works both through IDLE and at cli (still using Enthought).

UPDATE: Please see @fileunderwater's answer for a solution; I've run into this once myself, but had totally forgotten about it :)

Note