Random thoughts — Multiprocessing in python

Multiprocessing in python

By kirk86, Friday 25 August 2017, 0 comments.

In this blog post I'll try to cover the difference between the functionality found in the different functions that the multiprocessing package offers in python. The multiprocessing package offers the ability to work with multiple threads and if you wish with multiplce cores. First I'll try to cover the multiple core scenario and explain the difference between map, map_async, imap, imap_unorderd, apply, apply_async. In a later blog post I'll try to cover the multithreaded scenario. First I'll start with map/map_async and imap/imap_unorderd.

map operates on function that accepts only one arguement, it is concurrent, it blocks your code, meaning nothing else can be executed until its finished and finally it returns your results in an ordered manner. map consumes your iterable by converting the iterable to a list assuming it isn't already, breaking it into chunks, and sending those chunks to the worker processes in the Pool. Breaking the iterable into chunks performs better than passing each item in the iterable between processes one item at a time - particularly if the iterable is large. However, turning the iterable into a list in order to chunk it can have a very high memory cost, since the entire list will need to be kept in memory.

map_async on the other hand can operate on a function accepting multiple arguements it is not concurrent, it allows excution of code while it is still running, and finally it doesn't return your results in an ordered manner.

imap doesn't turn the iterable you give it into a list, nor does break it into chunks by default. It will iterate over the iterable one element at a time, and send them each to a worker process. This means you don't take the memory hit of converting the whole iterable to a list, but it also means the performance is slower for large iterables, because of the lack of chunking. This can be mitigated by passing a chunksize argument larger than default of 1, however.

The other major difference between imap/imap_unordered and map/map_async, is that with imap/imap_unordered, you can start receiving results from workers as soon as they're ready, rather than having to wait for all of them to be finished. With map_async, an AsyncResult is returned right away, but you can't actually retrieve results from that object until all of them have been processed, at which points it returns the same list that map does, basically map is actually implemented internally as map_async(...).get(). There's no way to get partial results, you either have the entire result, or nothing.

imap and imap_unordered both return iterables right away. With imap, the results will be yielded from the iterable as soon as they're ready, while still preserving the ordering of the input iterable. With imap_unordered, results will be yielded as soon as they're ready, regardless of the order of the input iterable.

To summarize the key differences between imap/imap_unordered and map/map_async are:

1. The way they consume the iterable you pass to them.
2. The way they return the result back to you.

apply sends a single task off to a worker process, and then blocks until it's complete. apply_async sends a single task off to a work process, and then immediately returns an AsyncResult object, which can be used to wait for the task to finish and retrieve the result. apply is implemented by simply calling apply_async(...).get(). With apply_async, if an exception occurs inside of your calling function, you won't know about it unless you explicitly call Pool['process_x'].get() on the failing AsyncResult object, which would require iterating over all of Pool. With map_async the exception will be raised if you call ~async_result.get() - no iteration required. map_async has built-in chunking functionality, which will make your code perform noticeably better if arguement list is very large.

Here is a final table summarizing all of those properties and diffferencies:

	Multi-args	Concurrence	Blocking	Ordered-results
map	no	yes	yes	yes
map-async	no	yes	no	yes
apply	yes	no	yes	no
apply-async	yes	yes	no	no
imap	no	yes	yes	yes
imap-async	no	yes	no	no