Wednesday, May 29, 2013

Cheap, but not free. Modular code has a price...

Introduction and Spoiler Alert

Recently, Nico Ekkart asked on the Google+ Python Community for suggestions on how to speed up his Python implementation of Project Euler Project 14. I remembered my initial implementation of Project 14 had taken way more than the allowed minute of processing on my laptop and that I'd souped it up by adding a cache to get the run time down. I dusted off the files and compared the performance to Nico's. Sadly, his outperformed mine.

To see the requirements for Project 14, see:

Spoiler alert: If you would rather solve Problem 14 on your own, don't follow the links below. In fact, you should read no further in this article, lest your thoughts be polluted with prior art. To see Nico's question and the associated discussion thread, see:

The source code for Nico's implementation is here:

As I explained in Creeping up on OOP in Python - Part 1, all my Project Euler programs have a wrapper main program to time and report the timing of the run and to print the answer. The source code for that wrapper script is here:

The source code for my initial uncached implementation is here:

The source code for my cached implementation is here:

I ran each of these programs using Cygwin Python 2.7.3. The log of the runs is here: and includes one additional run. The difference was that I tweeked the cache size up from 100,000 to 1,000,000 entries to see if it would improve my run time. It did, but my run time still was behind that of Nico's code.

So what's the difference between my code and Nico's code? Well, a crucial non-difference is that both my cached version and Nico's code use a cache ("memoization") to save having to recalculate results that are already known. Far as I can see, a crucial difference is that I used a Python generator function to run through the specified sequence, and I used a simple class to do the caching. Nico's code is more in-line, mostly a simple while loop to compute the sequences, so there are many fewer calls going on. Nico's code uses one statement to do the caching where I used a simple object to implement the stashing of values in to the cache with a "method" (stash) and another method (value) to check whether a value is already in the cache On the one hand, I like Python's structuring provisions for generating sequences, but on the other hand, given the millions of times this program is invoking the next step in the sequence, the overhead does mound up to noticeable amounts of CPU time. I was also worried that perhaps I didn't have enough real memory to be able to cache all million values, so my cache had a bound on how much it'd cache. Nico's code just trusted in the virtual memory to allow a very large dictionary.

Summary of run times:

Nico's code     6.645 CPU sec
Drew's nocache 177.638 CPU sec
Drew's cache   28.501 CPU sec
Drew's big cache   16.863 CPU sec


The modular control structures in Python are remarkably efficient, but they are not entirely free of overhead. I stopped tuning after I got the run time below the requirement of "no more than a minute", but clearly I was off the pace by a factor of 4 or so. Was the structure of my functions and objects worth that cost? Well, for having met the requirements, I think so. I have to concede that Nico's code was so short that I do find it hard to argue that my function def's and Cache class added any significant amount of clarity to the code. As for Nico's question of how to make his code run faster, the stackoverflow pointers in the original comment thread may help get his run time down even further. Memoization is the big performance win, and thanks to Python dictionary data type, that is darn easy to implement.

Revised 11/26/2013 - My employer blocks, so to facilitate sharing with my co-workers, I've copied the pastebin snippets to and updated the links here accordingly.