Saturday, August 2, 2014

Python and Parallel Processing

(Image taken from http://www.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html without explicit permission, but with acknowledgement of the source of the work).


I've written about Parallel Processing here before. e.g. See "What's the fuss about Parallel Processing?". I've been worried by my reading that seems to predict that imperative languages such as Python will never be able to safely cope with massively multi-processor architectures in the future. The "Downfall of Imperative Programming Languages" basically says that functional programming languages (such as Haskell and Erlang) are going to be the way of the future. I've been trying to get my head wrapped around Haskell, but so far without much success. So I was happy to hear a clear call from the Python camp that "I'm not dead yet!".


The article that gives me new hope is "Parallelism in One Line" the gist of which is that "Sure you can use the threading library to manage pools of threads (or of processes), but that takes a lot of lines of code and is very error prone, but there's another way to do it." The other way to do it is to use a parallel version of the map function. I, for one, didn't know that there's more than one version of "map" available in the Python library.


If you do a Google search for:

Python map

you will readily find documentation for Python's standard (iterative) map routine. e.g. "Python 2.7.8 builtin functions documentation - map".


If you readily want to find documentation of the parallel version of map that the Parallelism in One Line article talks about, you need to modify your Google search to:

Python map multithreaded

which will take you to a subset of the first search. I was amused to find among the search results a 2010 ActiveState Recipe for building your own "concurrent map" function, which drew a comment from one reader asking why not just use "multiprocessing Pool.map". The recipe's author admitted not knowing about that one.


From that 2nd search I found "Multiprocessing - process based threading documentation". It does worry me that the documentation seems to have more complexity and gotchas than the Parallelism in One line article owed up to. Arguably I shouldn't be blogging about this at all until I've actually given it a try myself (but I still don't have a multi-core processor here at home).

It's a trick?


If you've been reading this carefully, you might rightfully object that it's all a bit of a trick. The map function is an element lifted from the world of functional programming and then provided in Python. The parallel version is only safe in Python if the function that you are mapping is "pure". If the function code has side-effects, then you will have race conditions and potentially suffer horribly at the hands of multi-cores. The language isn't going to inherently protect you so you have to be careful out there.


If you want to read more about the challenges of Python vs. multi-core architectures, see "Python's Hardest Problem, Revisited", by Jeff Knupp. The piece parts to roll your own multi-processes or multi-threaded Python program remain available, but be sure you have plenty of iodine and bandages on hand if you cavalierly venture into the world of multi-cores in a language that makes no promises of (much) concurrent processing safety. Design your code carefully and watch out!