Saturday, September 27, 2014

Actual code - C vs. Python for a small problem

If you aren't much interested in writing software, this post is probably not for you. If you read the post without following the links to the example implementations, then I'm not sure what you are doing here, though you are certainly welcome to ask questions.


Recently on Quora.com someone asked to see C code to find the first 20 prime numbers. Seemed like an easy enough question, but the first guy who answered it, gave a C++-based pseudocode and some tips on how to convert that to C. That answer didn't make me happy. Heck, I figure if you're going to write in some other language, why not Python as a sort of testable pseudocode, but the question really needs that answer to then be re-stated in C.


So I sat down and banged out a Python program to find and print the first 20 prime numbers. That code is here: find20primes.py


I used a list to hold the list of primes. When the list grows to 20 items, I'm done. I start the list by telling it 2 is a prime number. I then only consider odd numbers as additional candidates. If the candidate is divisible by any of the prime numbers we've found so far, it is not a prime number. If the candidate isn't divisible by any of the prime numbers we've found so far, then the candidate is a prime number so we add it to our list of primes and, in ether case, add 2 to the candidate number to get the next candidate number to be tested. In C, we'll need a little more book keeping for the list, but since the max size of the list is bounded (20 items), things should translate from Python to C pretty easily. One trick that isn't entirely obvious is the use I made of Python's for...else control structure to catch the case of the for loop not exiting via break. We can paper that over with a goto in C, or you can add some flags


I was thinking that C has labeled break statements to let me do sort of the same thing as that for...else. But much to my chagrin, that's a feature of Java, not C. Oops. So, goto it is in my C translation of the program.


So I sat down with that Python listing open in one window and typed up find20primes.c That code is here: find20primes.c


I believe it is a straight-forward translation of the Python program into C. Of course, the C program is bigger in the sense of it has more lines of code. There are lines added to delimit the blocks and loops in the program, and lines added for variable declarations and lines added for the annoying book-keeping that C needs me to do where Python just handles those chores. I did run into some trouble where I didn't get the book keeping entirely correct the first time through. The program outputted 20 primes, but it started with 2, followed by 73, followed by 3, and it left out 71 at the end of the list. Huh? Turned out I was mistakenly incrementing primecnt before I'd used it to fill in the next slot in the primes array, so I skipped the primes[1] slot and just got 73 there by bad luck. Python would have told me about an uninitialized variable if there'd been a spot in the Python program for me to have committed that same kind of error.


Having finally gotten both programs working, conventional wisdom is that the C code should be much faster than the Python code. So I used the "time" command to see how the timings compare.


Using cygwin Python 2.7.8 on Windows 7 on a PC with an i5 processor and 8GB of RAM,

$ time python find20primes.py

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

real    0m0.136s
user    0m0.000s
sys     0m0.062s

Using cygwin cc 4.8.3 on that same Windows 7 PC:

$ time cc find20primes.c

real    0m0.238s
user    0m0.045s
sys     0m0.185s

$ time ./a.exe
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71

real    0m0.064s
user    0m0.015s
sys     0m0.015s

The execution time for the compiled C program was way less than the total time for the Python run, but if you add in the compile time for the C program, Python comes out ahead. Your mileage may vary.


By the way, if I throw in a -O option ("optimize the generated code") on the cc command, it further slows the compile down, while the run time is about the same.

$ time cc -O find20primes.c

real    0m0.336s
user    0m0.031s
sys     0m0.185s

$ time ./a.exe
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71

real    0m0.086s
user    0m0.000s
sys     0m0.015s


(the "real" elapsed time varies a lot from run to run. These run times are so short the variability makes it hard to make fair comparisons in races between the programs). I suspect that the real time is dominated by the console I/O to print the results, so optimizing the details of the loops isn't going to do much improving of things.


Now to make things more interesting, suppose I wanted to pass in as a command line argument the number of primes to be found. So long as the primes are still in the range of ints that only needs a malloc of the array in C but if the program is ever asked to get up into really big primes, the Python code will keep on chugging while the C code will need substantial rework to pull in an arbitrary precision integer library module. This blog article is already long enough, so I'll leave that generalization to "findNprimes" up to you. How big does N have to be to break the C code's use of ints? I suppose that with relatively minor surgery, you could convert over to use of long ints (or if your compiler supports them, to long long ints) in the C program. Looking into it lightly, it appears that the cc included with Cygwin supports long long ints. The format strings will need adjusting. Use %lld instead of %d.


If you've feeling anxious to try further experiments, see how pypy does in handling the find20primes.py program. That'll probably be more interesting with findNprimes.py with a really big value for N.


The moral to the story is that C may not be the best choice of programming language for this kind of problem.


Did you enjoy this article or was it too technical? Please add a comment down below so I know what you think about this sort of writing.

Tuesday, August 19, 2014

John Cotton Dana


I'm writing this on August 19, 2014. John Cotton Dana was born on August 19, 1856 in Woodstock, Vermont. I looked him up after I came across this snappy graphic on the Web:



I was surprised to learn the extent that he had influenced my life.

http://en.wikipedia.org/wiki/John_Cotton_Dana

Wikipedia says he died in New Jersey on July 21, 1929. I did find another web page that says he died in Manhattan, not New Jersey, so once again we see that Wikipedia is not a definitive source, even if it is darn informative.


So how did this long dead person influence me? Well, for starters, in 1909, he founded the Newark Museum. I grew up in Union, NJ, about 9 miles by car from Newark Museum. My Mom would take me there for an easy day trip when we had nothing specific to do. I particularly enjoyed the science and technology exhibit of simple machines. I haven't been there in many years now and surely there has been turn-over of the exhibits, if nothing else, to make up for mechanical wear and tear.


Besides founding that museum, John Cotton Dana also was a librarian. I think my favorite line from the Wikipedia article was this:

He would have found a library school curriculum intolerable, and doubtless a library school would have found him intolerable.


Before John Cotton Dana, libraries tended to have closed stacks. A librarian would go fetch the book that you wanted to look at. Dana pioneered the radical concept of open stacks. The main library of the Rutger's-Newark campus is named after him. I'll leave it to you to forage around the Internet or your favorite local library to learn more about this man. Hey! At least read through the Wikipedia article, please.


Not all of his point of view would be accepted today. For example, Wikipedia says that he organized the first ever children's library room, but he believed its proper role was to help provide material to teachers. He was opposed to "story-time" at the library.


If you have kids, remember John Cotton Dana today by taking them to your local children's library, where I wager, you will find open stacks for easy browsing of the book collection. Even if you don't have kids, today would be a good day to make sure you know where your library card is. (You do have a library card for your local public library, don't you?) Make sure your card is up to date. If not, renew it, and use it.


Speaking of children's libraries, here in Westbury, NY, the founding of the local children's library pre-dates the establishment of the local public library for adults. The 2 only joined together since 1965. History of Westbury Children's Library


Revised 8/19/2014 to correct a wretched typo: Dana was born in 1856, not 1956!

Wednesday, August 13, 2014

Getting more familiar with Python Classes and Objects.


If you've been following this blog, you're aware that I feel unprepared to make good use of the Object Oriented Programming facilities of the Python programming language. Python is kind enough to allow programming in non-OOP styles, but Python's OOP features keep glaring at me, reminding me of what I don't comfortably know.


I posted a question on Quora.com asking for real world examples of multiple inheritance being used in Python programs. I was disappointed about how few answers came back. That dearth of response left me with the suspicion that I'm not the only person around who isn't completely comfortable with multiple inheritance. http://www.quora.com/What-is-a-real-world-example-of-the-use-of-multiple-inheritance-in-Python. Looking around at other multiple inheritance questions on Quora (http://www.quora.com/Why-is-there-multiple-inheritance-in-C++-but-not-in-Java), I see some reason to suspect that super serious application of OOP is just going to need more time to sink into the heads of the majority of developers. So, I'll continue to watch and learn, but will try to remember to adhere to the KISS principal to the greatest extent possible.


Additional Lessons


  1. This 40 minute video from Ray Hettinger (The Art of Subclassing) explains how Python classes differ from classes in other languages. Ray tries to reshape people's thinking here, so if you aren't already deeply steeped in OOP lore, you may feel he's dwelling on the obvious. He may give you some ideas of appropriate uses of inheritance in Python objects.


  2. Ray mentions this 2nd talk in his video. This 2nd talk was the next talk after his at Pycon 2012. "Stop Writing Classes", Jack Diederich, 28 minutes. Basically, that video asserts that my own example so far of writing a class for a Python program is not very good. The clue: My example class had only __init__ and one other method. I could have written it as a simple function and used the partial function from the library functools module to handle the initialization.


Further Reading


I have 3 previous blog articles on OOP in Python.


In Creeping up on OOP in Python - Part 1 I described a use of an object-oriented library module, pyparsing, to solve a Project Euler problem.


In Creeping up on OOP in Python - Part 2 I reworked my solution to add a simple class of my own. I was happy that introducing that class made the code cleaner to read. But if you watched the "Stop Writing Classes" video given up above in this blog article, you'll probably notice that my class is an example of exactly what they say you shouldn't do.What can I say? I'm still learning this stuff.


The 3rd in my Creeping up on OOP in Python" series was a bit different from the first 2. It explored an academic question about multiple inheritance. It is exactly the kind of A, B, C example that Ray mentions avoiding in his talk. Creeping up on OOP in Python - Part 3. I haven't forgotten my promise of a Part 4 as soon as I have a practical example of multiple inheritance to substitute for A, B, C and D in that academic question. But so far, there is no Part 4.


Ray mentions "the gang of 4". If you aren't familiar with them, here's a reference for you to pursue: http://en.wikipedia.org/wiki/Design_Patterns. And he mentions "Uncle Bob". I also mentioned Uncle Bob, with some links here: SOLID software design.


Know more about this OOP stuff then I do? Well, don't keep the info to yourself. Please post a comment with a link or example to help me learn more. Have you found a particularly relevant MOOC that you'd suggest?

Friday, August 8, 2014

TED Talk for Mature Audiences - Dismal Science, Disappointing Careers


For many years now, Economics has been known as the "dismal science". So, who better than a (Canadian) economist to deliver a 15 minute TED talk on "Why You Will Fail to Have a Great Career". TED talks, in general, are upbeat talks and this one, despite it having a somewhat cynical tone to it, really does have an upbeat message in there. I do worry in pointing out the talk that it requires a certain amount of maturity to hear the real message. The message an immature audience just might receive from this talk could be extremely negative. You've been warned! So, if you think you have the maturity to find the constructive up-lifting message in his talk, give a listen to Larry Smith of U. of Waterloo on why you will fail to have a great career.


The talk mentions the 2005 Stanford Commencement Address by the late Steve Jobs. If you haven't given that one a listen, I urge you to spend the 22 minutes that it'll take to play that one for yourself too.


you don't have to agree with the speakers in these video talks. If you have an opinion on what you've heard here, you're invited to leave a comment down below. I'm looking forward to hearing from you. And, of course, if you know someone who might benefit from giving a listen to either or both of these videos, please pass along the link to this blog article.

Saturday, August 2, 2014

Python and Parallel Processing

(Image taken from http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.5.0/com.ibm.db2.luw.admin.partition.doc/doc/c0004569.html without explicit permission, but with acknowledgement of the source of the work).


I've written about Parallel Processing here before. e.g. See "What's the fuss about Parallel Processing?". I've been worried by my reading that seems to predict that imperative languages such as Python will never be able to safely cope with massively multi-processor architectures in the future. The "Downfall of Imperative Programming Languages" basically says that functional programming languages (such as Haskell and Erlang) are going to be the way of the future. I've been trying to get my head wrapped around Haskell, but so far without much success. So I was happy to hear a clear call from the Python camp that "I'm not dead yet!".


The article that gives me new hope is "Parallelism in One Line" the gist of which is that "Sure you can use the threading library to manage pools of threads (or of processes), but that takes a lot of lines of code and is very error prone, but there's another way to do it." The other way to do it is to use a parallel version of the map function. I, for one, didn't know that there's more than one version of "map" available in the Python library.


If you do a Google search for:

Python map

you will readily find documentation for Python's standard (iterative) map routine. e.g. "Python 2.7.8 builtin functions documentation - map".


If you readily want to find documentation of the parallel version of map that the Parallelism in One Line article talks about, you need to modify your Google search to:

Python map multithreaded

which will take you to a subset of the first search. I was amused to find among the search results a 2010 ActiveState Recipe for building your own "concurrent map" function, which drew a comment from one reader asking why not just use "multiproceswsing Pool.map". The recipe's author admitted not knowing about that one.


From that 2nd search I found "Multiprocessing - process based threading documentation". It does worry me that the documentation seems to have more complexity and gotchas than the Paralleism in One line article owed up to to. Arguably I shouldn't be blogging about this at all until I've actually given it a try myself (but I still don't have a multi-core processor here at home).

It's a trick?


If you've been reading this carefully, you might rightfully object that it's all a bit of a trick. The map function is an element lifted from the world of functional programming and then provided in Python. The parallel version is only safe in Python if the function that you are mapping is "pure". If the function code has side-effects, then you will have race conditions and potentially suffer horribly at the hands of multi-cores. The language isn't going to inherently protect you so you have to be careful out there.


If you want to read more about the challenges of Python vs. multi-core architectures, see "Python's Hardest Problem, Revisited", by Jeff Knupp. The piece parts to roll your own multi-processes or multi-threaded Python program remain available, but be sure you have plenty of iodine and bandages on hand if you cavalierly venture into the world of multi-cores in a language that makes no promises of (much) concurrent processing safety. Design your code carefully and watch out!

Wednesday, July 23, 2014

Danger Lurking in the Plastic Packaging of our Food?

I stumbled across this article from a web site unfamiliar to me:

http://www.care2.com/causes/if-plastic-bags-are-causing-infertility-in-pigs-what-are-they-doing-to-us.html

Worrisome, but I didnt want to sound the alarm without checking into it further. So I did some Google searching looking for the same warning from some other source. Found it from National Geographic and that's good enough for me to consider the story to be confirmed.

http://news.nationalgeographic.com/news/2014/06/140605-pigs-plastic-sperm-endocrine-disruptor-infertility-science/

The troublesome chemical showed up in plastic bags imported from China, but not in bags of local (Spain) manufacture? I wish I had more confidence that the matter would be properly handled and fixed by China. But I've not seen any tendency toward openness in China.

In case you need more to worry about, here's a 17 minute TED talk on some common pesticides in our environment. You say you aren't worried about frogs? Well, you probably should be.

https://www.youtube.com/watch?feature=player_embedded&v=X9NFPZGyDPg

What should you and I do about these problems? Alas, I haven't a clue about what to do. A sternly worded letter to your congress-person (who likely is heavily under the influence of some big agricultural chemical company) would be a beginning, but I've not got much more respect left for our government than I have for that of China.

Am I the only one who has noticed that the food labels track most bad things in milli-grams (mg) but trans-fats on the label are tracked in grams (g)? In case you don't remember, 1 g = 1000 mg. Anything under 500mg of trans-fat in a "serving" is loudly touted as zero grams of trans-fat per serving by the power of rounding down to the nearest whole number of grams. And there are some pretty funny notions of "servings" to be found on those labels. Muffins from my local super-market come in packs of 4 muffins, zero grams of transfat per serving, but a serving is 1/3 of a muffin. Makes me wonder if 1/2 a muffin has enough trans-fat in it that they'd have to round it up to 1g on the label.

Dietary guidelines for trans-fat is to keep the amount that you eat each day as low as possible. Heck, even the guidelines for sodium allow 1500 mg/day of sodium as a recommended daily allowance and 300mg/day of cholesterol. (Speaking of which, have you looked at the nutrition facts for a McDonalds egg and sausage biscuit? 1170 mg of sodium in one. 250 mg of cholesterol in one. But if you are hungry, one of those isn't much of a breakfast. Advice: don't make a habit of McDonald's for breakfast.

http://www.cdc.gov/nutrition/everyone/basics/fat/transfat.html

Now if our government really had our best interest in mind, wouldn't it push the food company's to report tran-fats in mg (like they do for cholesterol and sodium)? But does that happen? No. Obviously way too controversial a matter? Right up there with identifying which products contain GMO corn.

Sigh.

Sunday, July 6, 2014

The Curious Tale of Epigenetics

My blog article today is a bit off of the beaten path for me.


Instead of looking at a software topic, today's blog article topic is more biological: epigenetics.


I'll confess up front that I'm not a biologist and never took a genetics course in my life. Close as I got to such a course was I had friends in college who took the genetics course and they sweated through worrying about the progress of their cohort of cross-bred fruit flies. And I did work my way through Prof Keeton's Biology 101 course and its Biology 102 sequel. And when my wife was pregnant, when the pregnancy had gotten far enough along we had a sonogram and amniocentesis to make sure all was as well as could be gauged at that time. That's the full extent of my background in genetics. If I'd actually studied the topic in further depth, I suspect I'd now be having one of those "Everything You Know is Wrong" moments.


This article that I read this week is what brings all this up:

Grandma's Experiences Leave a Mark on Your Genes (May 2013, but talking about a discovery that dates back to a couple of guys having beers in a bar in 1992).


I think my favorite part of the article is on page 3, after the researchers had made their outlandish sounding hypothesis and carefully ran 3 different experiments to confirm that they were right and wrote the results up in a paper only to have the paper rejected:


Despite such seemingly overwhelming evidence, when the pair wrote it all up in a paper, one of the reviewers at a top science journal refused to believe it, stating he had never before seen evidence that a mother’s behavior could cause epigenetic change.


“Of course he hadn’t,” Szyf says. “We wouldn’t have bothered to report the study if it had already been proved.”


But there's a happy ending. The paper was published in the June 2004 issue of Nature Neuroscience and here, just about 10 years later, the news has actually caught my attention. Maybe next blog article I'll talk about a Princeton physicist who suggests that E=MC2. :-)


The one thing that bothers me in the linked article about epigenetics is I see nothing that discusses the effects of the mother's contribution to the genes vs. the father's contribution to the genes. Presumably the methyl groups from Mom are not in the exact same places as the ones from Dad. How do the resulting gene pairs interact? Maybe if I dug enough to find the answer to that, I'd lose my amateur standing or something.


And not to be facetious, but remembering the lab struggles of my friends taking that genetics course back in college, I do wonder if epigenetic effects can be demonstrated with fruit flies or something else with a life-cycle that fits within the time constraints of an undergraduate semester. The answer perhaps lies in looking for failed experiments where the dominant gene got trumped by a gene that was supposed to be recessive. I'm pretty sure that not every undergrad experiment produced the expected results. Quoting Issac Asimov: "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'". There'd be a certain amount of fun in being able to say "I failed the lab part of the course, but got the results published in Nature".


If you found this article to be as eye-opening as I did, and you know someone else who might enjoy it too, please pass along to them the link to this blog article. And, of course, if you know something about this topic, feel free to add a comment to further my education.