In this blog post I'll try to cover the difference between the
functionality found in the different functions that the
multiprocessing
package offers in python. The multiprocessing
package offers the ability to work with multiple threads and if you
wish with multiplce cores. First I'll try to cover the multiple core
scenario and explain the difference between map, map_async
, imap,
imap_unorderd
, apply, apply_async
. In a later blog post I'll try to
cover the multithreaded scenario. First I'll start with
map/map_async
and imap/imap_unorderd
.
map
operates on function that accepts only one arguement, it is
concurrent, it blocks your code, meaning nothing else can be executed
until its finished and finally it returns your results in an ordered
manner. map
consumes your iterable by converting the iterable to a
list assuming it isn't already, breaking it into chunks, and
sending those chunks to the worker processes in the Pool
. Breaking the
iterable into chunks performs better than passing each item in the
iterable between processes one item at a time - particularly if the
iterable is large. However, turning the iterable into a list in order
to chunk it can have a very high memory cost, since the entire list
will need to be kept in memory.
map_async
on the other hand can operate on a function
accepting multiple arguements it is not concurrent, it allows excution
of code while it is still running, and finally it doesn't return your
results in an ordered manner.
imap
doesn't turn the iterable you give it into a list, nor does break
it into chunks by default. It will iterate over the iterable one
element at a time, and send them each to a worker process. This means
you don't take the memory hit of converting the whole iterable to a
list, but it also means the performance is slower for large iterables,
because of the lack of chunking. This can be mitigated by passing a
chunksize
argument larger than default of 1, however.
The other major difference between imap/imap_unordered
and
map/map_async
, is that with imap/imap_unordered
, you can start
receiving results from workers as soon as they're ready, rather than
having to wait for all of them to be finished. With map_async
, an
AsyncResult
is returned right away, but you can't actually retrieve
results from that object until all of them have been processed, at
which points it returns the same list that map
does, basically map
is actually implemented internally as map_async(...).get()
. There's
no way to get partial results, you either have the entire result, or
nothing.
imap
and imap_unordered
both return iterables right away. With imap
,
the results will be yielded from the iterable as soon as they're
ready, while still preserving the ordering of the input iterable. With
imap_unordered
, results will be yielded as soon as they're ready,
regardless of the order of the input iterable.
To summarize the key differences between imap/imap_unordered
and
map/map_async
are:
- 1. The way they consume the iterable you pass to them.
- 2. The way they return the result back to you.
apply
sends a single task off to a worker process, and then blocks
until it's complete. apply_async
sends a single task off to a work
process, and then immediately returns an AsyncResult object, which can
be used to wait for the task to finish and retrieve the
result. apply
is implemented by simply calling
apply_async(...).get()
. With apply_async
, if an exception occurs
inside of your calling function, you won't know about it unless you
explicitly call Pool['process_x'].get()
on the failing AsyncResult
object, which would require iterating over all of Pool
. With
map_async the exception will be raised if you call
~async_result.get()
- no iteration required. map_async
has
built-in chunking functionality, which will make your code perform
noticeably better if arguement list is very large.
Here is a final table summarizing all of those properties and diffferencies:
Multi-args | Concurrence | Blocking | Ordered-results | |||||
---|---|---|---|---|---|---|---|---|
map | no | yes | yes | yes | ||||
map-async | no | yes | no | yes | ||||
apply | yes | no | yes | no | ||||
apply-async | yes | yes | no | no | ||||
imap | no | yes | yes | yes | ||||
imap-async | no | yes | no | no |