Tuesday, February 9, 2016

Project Euler Problem 1 - In Python

Project Euler

Project Euler is a web site with hundreds of problems for you to tackle however you want to solve them. Some of them, if you are sufficiently adroit mathematically, can be solved on the back of an envelope. Most of them clearly need a piece of software to grind through the calculations. You can use whatever programming language you prefer. the site asks only for the answer. Each problem yields a numeric answer. If you provide the correct answer, the site credits you for solving the problem and grants you access to a discussion page. The discussions are mostly people bragging how they solved the problem. The bragging isn't anything important, but it can be useful to look at other people's approaches to see other ways of looking at the problem that perhaps hadn't occurred to you.

The one "guideline" is that a good solution should need no more than a minute of time on your computer. If your solution needs way more time than that, then you should look for a better solution.

Tell Ya What I'm Gonna Do...

I'm going to show you how to solve some of the project Euler problems using the Python 3 programming language. Instead of having you install Python 3 on your computer, I'll be using the "Python in the cloud" facilities of pythonanywhere.com. You can sign up for a free account of your own there. For small programs like ours, they offer their service for free. If you use their site to build something that becomes enormously popular (say, the next "Farmville" game), you'll be needing to pay for their services, but we're far, far from crossing that line between trivial computing load and significant computing burden.

It'd be useful if you figure out a good way to prepare your Python programs in a file, but you can start with notepad or whatever simple editor you are most comfortable with. If you tell us in the comments on this blogpost what editor or IDE ("Integrated Development Environment") you prefer, it may influence whether I write future articles addressing that editor or IDE.

Project Euler Problem 1

The first problem on project Euler is one that calls for very little code. I'll tell you that I approached the problem as a computer programmer, but in the discussion page for the problem there were people who knew how to compute sums of a series using a simple formula and they were able to readily solve problem 1 using only pencil and paper. That's just proof that although I've worked with mathematicians, I'm a software guy, not a mathematician.

The crux of problem 1 is: Find the sum of all the multiples of 3 or 5 below 1000.

The example given makes clear that they are only asking us to consider positive integers. So no need to worry about negative 3 and negative 5 and so on. ("If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.").

Spoiler Alert.

I'm going to talk here about how I solved the problem. If you want to solve the problem on your own, stop reading here and go solve that problem. Feel free to come back later and tell us in comments how your solution is better than mine. Better in what sense?

Solving Problem 1 in Python

At this point, I am tempted to show you the small program that I used to solve the problem, but that is so small a program that you'd probably glance at it and walk away grumbling "nothing to learn here", so I've decided to creep up on the solution a bit more slowly. I hope you pick up some comfort with Python programming along the way.

The General Shape of the Program.

Generally, the way to write a program is to start out with a rough idea of what the program should look like. You can write this down in "pseudo-code" - a mixture of your native language and whatever bits of programming language structure you are comfortable to stir in. One of the delights of Python is that Python code looks a lot like pseudo-code, so if you nudge your pseudo-code into being real Python code, you can test out your pseudo-code by running the program. When it works, perhaps you are done, but if your actual target language isn't supposed to be Python, then you have a working prototype that you need to re-code into your intended target language (Perhaps C or C++ or even some assembler language). It is worth noting that in the real world, often a manager faced with a working prototype that works well enough, will suddenly decide that a Python implementation is plenty good enough to declare the problem solved, even if its the first instance of Python code admitted into "production" use in that shop.

This program is going to need an iteration (loop) to consider all the natural numbers less than 1000. And inside that loop it is going to need a conditional statement (if statement) to select the numbers that are multiples of 3 or of 5 (which for whatever reason are the numbers that this problem considers to be interesting). And for the numbers that are multiples of 3 or 5, we'll need a little arithmetic to accumulate the sum of the selected numbers.

There's more than one way to do it. For example, we could loop through all the non-negative integers <1000 and build up a list of the numbers that meet the criteria of being "interesting". Then we could take the sum of that list to get the desired answer. But we have no further use for the list in this problem, so I assert it is simpler to just accumulate the sum as we go.

Another approach would be to generate the multiples of 3 and to generate the multiples of 5 that are less then 1000, and then tally up the generated lists, but you'd need to be careful not to include any numbers twice. Some number (e.g. 15) are multiples of both 3 and of 5, but only should get added into the sum for this problem once, so I assert it is simpler to just consider each of the candidate numbers and accumulate the ones that meet the criteria for being interesting.

Got another plausible shape for the solution this problem?

So, I'm going to stick with my initial proposal which in Python-like pseudo-code looks like this:

sum=0
for num in numbers 1 thru 999:
    if num is "interesting":
       sum += num
print(sum)

Note that in Python, indentation is used to delineate the blocks of code. My Python-like pseudo-code follows that same convention. Thus the body of the loop statement is indented under the "for" that introduces the loop. The body of the "then" clause that is made conditional by the "if" statement is indented inside the "if". Since the "if" is inside the "for" loop, the accumulation of the sum is doubly indented. The initialization of sum to zero is outside of the "for" loop so it isn't indented at all. The "print" statement is also not indented. We don't want to print the partial sum on each iteration of the loop, so it is important not to indent that final "print". But maybe when you are debugging, a print statement inside the loop would be a helpful addition. That would be done by adding another print statement and indenting it so it is inside the loop. You might want to include a comment on your debugging code so you can trim the debugging code out when your program is in good working order.

The "sum += num" statement is Python short-hand for "sum = sum + num", since accumulating totals is such a frequently needed operation.

I hope you've noticed that the pseudo-code is not quite working python, so we aren't done yet. The most glaring magic is how do we really decide if a given number, which we've named "num", is "interesting"?

How to test if a number is a multiple of some value

So we need to consider how to test a number to see if it a multiple of 3 (or of whatever number we want to test to see if is a multiple of). Python has a modulo operator (%) that is documented here. x % y is how you ask Python to compute the remainder of dividing the number stored in x by the number stored in y. So all you need to realize is that if x is a multiple of y, that the remainder will be 0, and if x is not a multiple of y, then the remainder will not be zero.

So if we have a number to test named num, then we can test if num is a multiple of 3 by using this Python code:

num % 3 == 0

The result of that test expression is a True or False value (True if the remainder is 0. False if the remainder is not 0). But we want to know if the number is divisible by 3 or is divisible by 5. Happily, Python has an "or" operator that will let us combine 2 True/False values in exactly the way we need.

Here's a copy/paste of my PC anywhere session where I tried this out:

>>> num=27
>>> num % 3
0
>>> num % 3 == 0
True
>>> num % 5
2
>>> num % 5 == 0
False
>>> num % 3 == 0 or num % 5 == 0
True

As you can see, the Python rules for operator precedence did exactly the right thing for us here, but I find that long expression a bit hard to read, so I added some un-necessary parentheses to make it easier for a human to parse.

>>> (num % 3 == 0) or (num % 5 == 0)
True

How to conjure up a list of numbers?

In many programming languages, Fortran for instance, the way you conjure up a sequence of numbers is you have a variable to serve as a counter. You initialize the counter to a starting value, then you increment the counter to get the next value and you need a test to decide when you've gone as far as you want to go. If you've programmed in C, that's what a "for" loop in C does. But there's a more Pythonic way to do this in Python, so please take care not to write Fortran (or C) code in Python syntax. A Python "for" loop looks like
for x in list:
    do something with x
Where x is an arbitrary variable name that takes on each of the values in the list and the body of the loop (which I've represented as "do something with x") processes each of the values as x is stepped through the list.

If you want to read more about looping in Python, especially if you are comfortable with looping in other languages, I strongly recommend Ned Batchelder's blog post "Loop like a native".

Not only does Python have a statement to consider each of the values in a list of values. It also has a built-in generator of a list of values. In Python 2, "range" is a built-in function to return a list of integers, but such a list takes up space. So "xrange" was introduced to return a generator that'll provide the desired sequence of integer values on demand without ever actually creating the list as a whole in memory. xrange worked well enough and explaining the distinction between range and xrange was ugly enough that in Python 3, xrange became range and you only need to know its a generator if you're interested in the details of how stuff works. So just say range(1000) to conceptually whip a list of numbers 0 through 999.

So now our pseudo-code has morphed into runnable Python code:

sum=0
for num in range(1000):
    if (num % 3)==0 or (num % 5) == 0:
       sum += num

print(sum)

And we're done, except for running the code to get the answer. I'll not reveal the numeric answer here, so please learn how to run this yourself.

Mystified? Please tell me if I've confused you so I can polish things up before my next blog post.

Saturday, December 26, 2015

Blog Post #100

Where Have I Been?

This blog is now about 6 years old. Thanks to irregular gaps between my posting articles, this is only blog post #100.

My apologies for the long gap between my most recent article on this blog and this one. It's been well more than a year since I last posted an article here. Such laxity needs an explanation.... Last year my wife succumbed to her 2nd heart attack and died in her sleep one Sunday night. Needless to say, that was a huge emotional setback for me. Also last year I developed a severe infection in the middle toe of my right foot. Despite doctor's care, antibiotics and so forth, the infection moved into the bone and the doctor's concluded that the toe had to immediately come off. The result was I was in the hospital for much of December and was sent home on the evening of Christmas day.

Medical adventures continued through the next couple of months with nightly treatments in a hyperbaric chamber at Winthrop hospital to accelerate the healing. It worked. The nicest thing about laying in the the hyperbaric chamber for a couple of hours each evening is you get to watch your choice of movies (from their large DVD collection), so I caught up on a lot of semi-recent movies that I'd otherwise missed seeing. Down side is it's very tight space and they have a rule that you can't bring a date (or even popcorn) into the tube with you. ;-)

Unrelated to my foot problems, by late summer 2015, I found my vision was fading to the extent I was no longer able to read. I cranked up the font size on my PC screen to try to get by, but the wretchedness of my typing accuracy with little ability to proofread was depressing and certainly discouraged any attempts to type up an article for the blog. So I went to an ophthalmologist and learned I had developed cataracts in the lenses of both my eyes. 10/29 they operated on the right eye (which was the worse of the 2 eyes, by far) and 11/13 they operated on the left eye. I was a bit disappointed that things weren't instantly better after the surgery, but apparently the surgery causes inflammation in the eye. It takes a few weeks for the swelling to go down and the vision to clear up.

I've needed eye glasses for distance since the 4th grade in elementary school. Now, thanks to the plastic lenses they implanted in place of the murky lenses they removed, my distance vision is fine without glasses. I can read the newspaper without glasses too, but do admit to having problems with reading fine print and reading in dim light is still more difficult then I'd like. I have an appointment with the Ophthalmologist this week to get fitted for reading glasses, so maybe the fine print of EULA's will become legible to me again soon.

My daughter insists the plastic lenses are just phase 1 of a secret plan to make me "bionic". Clearly she's watched too many cable TV re-runs of old shows.

Modern surgery for cataracts is pretty amazing. The ophthalmologist describes it as a 10 minute procedure per eye, but that's from his perspective. The day of the surgery you have to be there early and go through various check-in reviews of your meds, and transportation arrangements. It is an absolute requirement that you have someone available to escort you home. I believe the local taxi company could have done fine, but the eye surgicenter insisted that wouldn't do. As a non-driver, I use the local taxi company a lot. Often when I get into one of their cabs the driver just asks me if I want to go home and needs only a "yes" to get me to the right place. Fortunately, there's a company called the Fairy Godmothers of Long Island that provides people to escort patients home from surgery. My daughter hooked me up with them when she couldn't get to the suburbs to escort me herself. (Oh, great! Wait until word gets around that my daughter fixed me up with an escort service. :-). The Fairy Godmothers took me to the surgicenter and then waited for me to be released from the recovery room, they then took me home and made sure i got the door unlocked and was safely home. Heck, I've had taxi rides home from Walmart where the taxi driver even helped me carry my purchases into the house. Fairy Godmothers was a valuable service for me in that I needed them to satisfy the surgicenter people, but I'm still convinced the taxi company could have done just as well.

So that's more like 4 hours per eye. Oh, and for a week prior to the surgery and for weeks afterward there were eyedrops to apply. At it's worst there were 3 different eyedrops 3 times/day. You need at least 5 minutes between drops so you don't just wash out the previous medicine with the next one, so it felt like all eye drops all the time each day. And the day after the surgery you need to see the ophthalmologist to get the eye patch off, and a couple more weekly appointments to make sure the vision is coming into focus the way it should. And by then it's time for the surgery on the other eye. A "10 minute procedure" started feeling like a couple of months lost to working on the eyes. The good news is things apparently worked out fine.

The Stats By Country for the Blog as of this post

So, enough with the excuses for my recent level of posting inactivity on the blog. It has been close to 3 years since "Blog Post #50". Readership on the blog has picked up nicely in those 3 years. As of Blog post #50 I'd had a total of 2600 page views from 2009-2013. My total page view count as of today is up to 33,370 over the life of this blog. That's not even close to passing for viral among YouTube videos, and I certainly don't need to dream yet about how to monetize my page views, You'll notice I don't put any advertising on my pages or even a "please contribute" tip jar button.

Google reports the top 10 countries for page views. The biggest readership by far is still in the US with 15,172 page views over the life of this blog. I wish that was broken down by state, but blogspot.com doesn't slice the data that finely. Russia remains in 2nd place. Germany dropped from 3rd to 4th place as France moved up from 7th place to 3rd place. As before, India finished just behind Germany. New to the list is Ukraine, which finished just behind India. China dropped from 5th to 9th on the list. (Blame the Great Fire Wall of China, which I'm told officially bars access to blogspot.com sites from China?). Slovenia dropped from 6th to below 10th place, so it's vanished from this table. United Kingdom moved from the 8th place slot to the 7th place slot. Canada is newly on the top-10 list, just behind the United Kingdom. The Netherlands dropped from 9th to below 10th, so it's vanished from this table. Poland remains in the 10th place slot, just behind China.

United States 15172
Russia 1632
France 1380
Germany 1248
India 1235
Ukraine 1229
United Kingdom 990
Canada 418
China 283
Poland 224

Origin of pageviews by country

And of course the data has a long tail that blogspot.com doesn't show me. I know that from time to time I've had readers from Australia and Brazil. I think that I've even had occasional clicks from somewhere in Africa, but evidently not enough to be anywhere but in the invisible long tail of the data.

Top 10 articles by page-views

Actual code - C vs. Python for a small problem 1731
What's the fuss about parallel programming? 1537
Pythonic Python - Writing Python code that fits the language's idioms 1471
Python Generators 1449
Is the Udacity CS101 course watered down? 1442
Python and Parallel Processing 566
Linus Torvalds on Teaching Children to Program 451
Cornell Hydraulics Lab 433
Literate Programming 379
Home Networking with FiOS - Don't Cross the Streams 365

Top 10 articles by pageviews

I'm pleased to say that some serious articles have at last edged out the dumb humor articles from the most-viewed list.

The "Python Generators" article seems to owe its popularity by it being referenced with a link in another person's Python-oriented blog.

How the readers access the blog

blogspot.com reports the breakdown by browser. Over the life of this blog, the most used browser to access my blog was Chrome (39%), followed by Firefox (32%). Internet Explorer (or Internet Exploder as I tend to refer to it) was a distant 3rd at 16%. Annoyingly, the stats don't distinguish Internet Explorer versions. It would be nice to see if IE6 is at least fading though I hear it is still about 10% of the browser views in the world. I also notice that some of the IE accesses are likely "fakes" as they come in rapid bursts of >100 page views in a short interval of less than an hour, with none of the views attributed to any particular blog article. (Page scraping in the name of IE?). 4th place went to Apple's Safari browser at 7%. 5th place was a tie of Mobile Safari and Opera at 1% each. The tail with < 1% each included Instapaper, "CriOS", "Mobile", and "OS; FBSV".

Here's the breakdown by operating system of the pageviews: Windows (56%), Macintosh (17%), Linux (11%), Android (8%), Other UNIX (2%), iPhone (1%), iPad (1%), UNIX (< 1%), iPod (<1%), Windows NT 6.1 (<1%), Funny that they break out just one ancient version of Windows and don't mention Windows XT, Windows 7, Windows 8, Windows 10, Windows Vista, ... which presumably are all lumped into "Windows" 56%. Your guess is as good as mine as to the categories UNIX and Other UNIX. SunOS, Sun Solaris, SGI Irix, IBM AIX, BSD UNIX?

Complaint Department

In blog Post #50 I lamented that contrary to what all the hype about "Web 2.0" would have you expect, that the vast majority of my readers are silent. That is, there are disappointingly few comments and feedback about the articles. As I put it in Blog Post #50:
What does it take to get people to post comments on the articles? 50 articles and I've only had 12 comments all together. And 6 of those were on one article about Education and Technology. Begging for comments doesn't seem to work. Do I need to write about more incendiary topics to get any kind of reaction? Heck, even feedback like "that's kind of obvious", and "here's another web page that says that better than you did" would be useful feedback to me. Didn't everybody get the memo about Web 2.0 being interactive? It isn't supposed to be read-only.

For the most part that is still true, but I did finally have one article that got relatively heavy response. 139 comments on "Actual code - C vs. Python for a small program". This surfaced a new problem. blogspot.com does a relatively poor job of organizing the comments in a clear accessible way when there are a significant volume of comments. The comment mechanism also is little help for contributing complex data (e.g. source code, data tables) in comments.

A distant 2nd place for number of comments went to "Marketing the Importance of Programming Education" with 27 comments after only 214 page views. That's a volume of comments that the blogspot.com comments mechanism seems comfortable with displaying. (Well, it displays them without complaint, but much beyond a couple of dozen comments, it is difficult to unravel who was replying to what).

Other complaints are mentioned elsewhere in this article: Lack of breakdown of varieties of IE browsers, lack of breakdown of varieties of Windows, lack of clarity of breakdown of versions of UNIX, lack of breakdown of US into finer locations (e.g. by state), lack of access to the long tail of the stats (e.g. countries that didn't make the top 10 over all time). Distinguishing real page views from bots accessing the blog.

Thursday, May 14, 2015

GMO Labeling - Even a Kid Can Understand Why.

Last July in Danger Lurking in the Plastic Packaging of our Food? I wandered away from that article's title topic and took a light poke at the government for not insisting that GMO ingredients in food should be labeled as such.

GMO ingredients still aren't labeled, and here's a TED talk from a 15 year old that makes clear why they ought to be. 15 year old explains why GMO's ought to be labeled, It's only a 13 minute talk. Made sense to me. Is your congress-person as smart as a 15-year old? Honest enough to do what's right?

Not sure who represents you in the federal House of Representatives and Senate? Well, if you know your zip code, visit Find Your Senators and Representatives ro find out who represents you. Now that you're armed with that information, you can visit Contacting the congress to actually contact them and make your opinion known. Isn't even going to cost you a stamp as they tend to have web forms or e-mail to receive your communications.

My apologies to my non-U.S. readers, but please work out how to let your government know of the need for GMO labeling.

Blogs aren't meant to be read-only. Please take a moment to comment on this article and re-share it with your friends. Even if you just give me a comment lifted from the GEICO commercial: "Everybody knows that!", I'll still be glad to hear from you.

Saturday, November 8, 2014

Playing with my Food

This blog article today is non-technical. It is about food, cooking and grocery shopping. Family members and my grad school room mate who are familiar with my low-level of expertise on these topics are probably doubled over in laughter at my choice of topic today. If you're wondering, today's article was inspired by an impulse purchase at Walmart, a bag of Pecan pieces that I've been especially enjoying.

My first wife, now ex-wife, had her faults, but she was an excellent cook. The difference in how we'd use cookbooks was fascinating to me. I can cook by following a recipe. I do that by leaving the cook book open on the kitchen counter and do the steps of the recipe step by step, measuring each ingredient carefully. The results are generally OK, but rarely amazing. My ex- would go out to the shelf with the cookbooks in the dining room, spend some time looking through them, announce to the air "Oh, I see", put the books back on the shelf, and then go out to the kitchen and cook. Her dishes inevitably included soy sauce and MSG among the ingredients and were generally very good and occasionally amazingly great, though they did lack reproducibility. My daughter's one complaint about her Mom's cooking was that she never did turn out great Brownies. I have heard it said that baking, more than most kinds of food prep, calls for careful measuring of ingredients, time and temperatures. Generally no room for soy sauce nor MSG in most Brownie recipes.

My current wife often tells me how great a cook she is, but she gets busy and rarely spends time in the kitchen. so I mostly have only her word on this. A seasonal specialty of hers is turkey. Most years in the run-up to Thanksgiving she roasts multiple turkeys to give away to friends, to the local police department, and even sandwiches for the day laborers that gather on certain street corners in town here hoping for work. But she manages to violate all the Food Network's tips on how to be sure of having your turkey come out perfectly (At most a 15 pound bird, no stuffing inside the cavity of the bird, no basting, ...). Her birds are inevitably the largest in the store, often purchased fresh, not frozen, because who has time to thaw such a large object? She stuffs it with more stuffing then I'd expect could fit, and then the cooking is extremely labor intensive with endlessly repeated basting to get the skin brown and crispy. The down side for us is that the turkey for our own Thanksgiving dinner often isn't ready until incredibly late on Thanksgiving day.

One of her turkeys, some years ago, actually made its way to the table in Reagan's White House. She got a nice thank-you note from Ronald on White House stationary, attesting to how great her turkey is. Who am I to question her politician nephews to determine if they indeed arranged for her turkey to be served at the White House or just forged a thank you note and ate the turkey themselves?

This year is shaping up to be different. We're planning to visit one of her adult children in Georgia for Thanksgiving dinner. I've actively discouraged all Barbara's thoughts that she should cook a turkey and bring it with us. Dian invited us, so leave it to Dian to prepare dinner. Offer to help in her kitchen, but if she'd rather do it herself, stay out of her way. We'll see how this goes. No guns in Dian's house, but it's hard to guarantee things won't break down into a carving knife fight in the kitchen.

Grocery lists

Much as I was surprised at how my ex- would cook without reference to a cookbook, I'm surprised that my wife goes grocery shopping without bringing along a shopping list. I can go to the store to pick up bread, milk and eggs without a shopping list, but if the trip is to stock up on more than that, I need a list. I keep a list on my desk at home. Whenever I find we are out of something or getting low on something, I add it on to my list. Occasionally, when I am looking at a recipe that calls for an ingredient we don't generally have on hand, that ingredient goes onto the list too.

I found that a problem with grocery-list-driven shopping is that sometimes an item falls out of inventory here and until I specifically miss it, we don't re-stock it, because it wasn't on the list. So now I do 2 things to assure variety in the household pantry:

  1. List things on the grocery list in more generic terms. e.g. instead of listing chicken noodle soup, I'll list "soup" and spend some time looking through what the supermarket has on offer. This does add time to the shopping trip as my wife claims to be allergic to pork, doesn't want too much salt, and says MSG gives her headaches, so I spend a long time reading ingredients lists as I'm picking out soups. We recently tried a Butternut Squash Bisque in a box that way, and I liked it.
  2. I force myself to browse the supermarket for things not on my list. Sometimes that results in fairly frivolous purchases, like flat-bottomed wafer ice cream cones. I do think I eat less ice cream if I pack a small cone with ice cream instead of scooping ice cream into a bowl.

    Seasonal items, like fresh apple cider, find their way into my grocery cart from looking at what the store has, rather than shopping off my grocery list. My wife makes a face and leaves the cider to me. She has memories of a rusty old cider press she saw in use on a farm some years ago and is convinced the cider isn't wholesome.

Pecan Pieces

A recent minor impulse purchase from the grocery section at Walmart was from their "seasonal" aisle, a 24 ounce bag of pecan pieces. I had no idea what I was going to do with them when I bought them, but have been pleasantly surprised at how much I've been enjoying them. Tasty as they are, I'll warn from having read the nutrition box on the side of the bag, that nuts are not a low calorie snack. They are extremely packed with fat, so that keeps them low in sugar and even low in carbohydrates, but high in calories.

So what to do with them? Well, one of my favorite simple uses is to sprinkle a small handful onto my bowl of cold breakfast cereal. This adds some interesting flavor and texture to even a plain old bowl of Cheerios. Sure, the store sells "Honey Nut Cheerios", but have you read the ingredients of those? They add almond flavor, but absolutely no nuts. No thanks. I much prefer to start with plain Cheerios and add my own sprinkling of genuine nuts. Good too on other varieties of cereal, Special K, for instance. One 24 ounce bag is sufficient for weeks of breakfasts.

We recently received a gift of a couple of large cartons of apples from upstate New York. An apple a day makes a nice crisp tasty snack, but what to do with all these apples? Well, one thing I did to use some of them was to make baked apples. Coring the apples with a plain old sharp knife was a bit of a hassle, but by using a covered Corningware baking dish in the oven, the recipe mostly just needed baking (and cooling) time. As Alton Brown was so fond of saying on Good Eats "Your patience will be rewarded".

I did add some pecan pieces to the cinnamon sugar used to fill the cored apples and the result was indeed delicious. My wife's opinion was that New York Delicious apples stay a bit too crunchy through the baking to make a really great baked apple. She thinks it would have been better if I'd used some big round Macintosh apples, but I used what I had on hand. She tells me her baked apples are way better than mine, but we've been together for more than a decade and not once has she served me a baked apple. So I don't argue about it, but if I want a baked apple, I guess, I'll just have to make it myself. On a cool night, it's actually nice to have apples baking in the oven making the kitchen warm and adding a delightful scent to the house. I wonder when I'll get around to doing that again?

Saturday, September 27, 2014

Actual code - C vs. Python for a small problem

If you aren't much interested in writing software, this post is probably not for you. If you read the post without following the links to the example implementations, then I'm not sure what you are doing here, though you are certainly welcome to ask questions.


Recently on Quora.com someone asked to see C code to find the first 20 prime numbers. Seemed like an easy enough question, but the first guy who answered it, gave a C++-based pseudocode and some tips on how to convert that to C. That answer didn't make me happy. Heck, I figure if you're going to write in some other language, why not Python as a sort of testable pseudocode, but the question really needs that answer to then be re-stated in C.


So I sat down and banged out a Python program to find and print the first 20 prime numbers. That code is here: find20primes.py


I used a list to hold the list of primes. When the list grows to 20 items, I'm done. I start the list by telling it 2 is a prime number. I then only consider odd numbers as additional candidates. If the candidate is divisible by any of the prime numbers we've found so far, it is not a prime number. If the candidate isn't divisible by any of the prime numbers we've found so far, then the candidate is a prime number so we add it to our list of primes and, in ether case, add 2 to the candidate number to get the next candidate number to be tested. In C, we'll need a little more book keeping for the list, but since the max size of the list is bounded (20 items), things should translate from Python to C pretty easily. One trick that isn't entirely obvious is the use I made of Python's for...else control structure to catch the case of the for loop not exiting via break. We can paper that over with a goto in C, or you can add some flags


I was thinking that C has labeled break statements to let me do sort of the same thing as that for...else. But much to my chagrin, that's a feature of Java, not C. Oops. So, goto it is in my C translation of the program.


So I sat down with that Python listing open in one window and typed up find20primes.c That code is here: find20primes.c


I believe it is a straight-forward translation of the Python program into C. Of course, the C program is bigger in the sense of it has more lines of code. There are lines added to delimit the blocks and loops in the program, and lines added for variable declarations and lines added for the annoying book-keeping that C needs me to do where Python just handles those chores. I did run into some trouble where I didn't get the book keeping entirely correct the first time through. The program outputted 20 primes, but it started with 2, followed by 73, followed by 3, and it left out 71 at the end of the list. Huh? Turned out I was mistakenly incrementing primecnt before I'd used it to fill in the next slot in the primes array, so I skipped the primes[1] slot and just got 73 there by bad luck. Python would have told me about an uninitialized variable if there'd been a spot in the Python program for me to have committed that same kind of error.


Having finally gotten both programs working, conventional wisdom is that the C code should be much faster than the Python code. So I used the "time" command to see how the timings compare.


Using cygwin Python 2.7.8 on Windows 7 on a PC with an i5 processor and 8GB of RAM,

$ time python find20primes.py

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

real    0m0.136s
user    0m0.000s
sys     0m0.062s

Using cygwin cc 4.8.3 on that same Windows 7 PC:

$ time cc find20primes.c

real    0m0.238s
user    0m0.045s
sys     0m0.185s

$ time ./a.exe
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71

real    0m0.064s
user    0m0.015s
sys     0m0.015s

The execution time for the compiled C program was way less than the total time for the Python run, but if you add in the compile time for the C program, Python comes out ahead. Your mileage may vary.


By the way, if I throw in a -O option ("optimize the generated code") on the cc command, it further slows the compile down, while the run time is about the same.

$ time cc -O find20primes.c

real    0m0.336s
user    0m0.031s
sys     0m0.185s

$ time ./a.exe
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71

real    0m0.086s
user    0m0.000s
sys     0m0.015s


(the "real" elapsed time varies a lot from run to run. These run times are so short the variability makes it hard to make fair comparisons in races between the programs). I suspect that the real time is dominated by the console I/O to print the results, so optimizing the details of the loops isn't going to do much improving of things.


Now to make things more interesting, suppose I wanted to pass in as a command line argument the number of primes to be found. So long as the primes are still in the range of ints that only needs a malloc of the array in C but if the program is ever asked to get up into really big primes, the Python code will keep on chugging while the C code will need substantial rework to pull in an arbitrary precision integer library module. This blog article is already long enough, so I'll leave that generalization to "findNprimes" up to you. How big does N have to be to break the C code's use of ints? I suppose that with relatively minor surgery, you could convert over to use of long ints (or if your compiler supports them, to long long ints) in the C program. Looking into it lightly, it appears that the cc included with Cygwin supports long long ints. The format strings will need adjusting. Use %lld instead of %d.


If you've feeling anxious to try further experiments, see how pypy does in handling the find20primes.py program. That'll probably be more interesting with findNprimes.py with a really big value for N.


The moral to the story is that C may not be the best choice of programming language for this kind of problem.


Did you enjoy this article or was it too technical? Please add a comment down below so I know what you think about this sort of writing.

Tuesday, August 19, 2014

John Cotton Dana


I'm writing this on August 19, 2014. John Cotton Dana was born on August 19, 1856 in Woodstock, Vermont. I looked him up after I came across this snappy graphic on the Web:



I was surprised to learn the extent that he had influenced my life.

http://en.wikipedia.org/wiki/John_Cotton_Dana

Wikipedia says he died in New Jersey on July 21, 1929. I did find another web page that says he died in Manhattan, not New Jersey, so once again we see that Wikipedia is not a definitive source, even if it is darn informative.


So how did this long dead person influence me? Well, for starters, in 1909, he founded the Newark Museum. I grew up in Union, NJ, about 9 miles by car from Newark Museum. My Mom would take me there for an easy day trip when we had nothing specific to do. I particularly enjoyed the science and technology exhibit of simple machines. I haven't been there in many years now and surely there has been turn-over of the exhibits, if nothing else, to make up for mechanical wear and tear.


Besides founding that museum, John Cotton Dana also was a librarian. I think my favorite line from the Wikipedia article was this:

He would have found a library school curriculum intolerable, and doubtless a library school would have found him intolerable.


Before John Cotton Dana, libraries tended to have closed stacks. A librarian would go fetch the book that you wanted to look at. Dana pioneered the radical concept of open stacks. The main library of the Rutger's-Newark campus is named after him. I'll leave it to you to forage around the Internet or your favorite local library to learn more about this man. Hey! At least read through the Wikipedia article, please.


Not all of his point of view would be accepted today. For example, Wikipedia says that he organized the first ever children's library room, but he believed its proper role was to help provide material to teachers. He was opposed to "story-time" at the library.


If you have kids, remember John Cotton Dana today by taking them to your local children's library, where I wager, you will find open stacks for easy browsing of the book collection. Even if you don't have kids, today would be a good day to make sure you know where your library card is. (You do have a library card for your local public library, don't you?) Make sure your card is up to date. If not, renew it, and use it.


Speaking of children's libraries, here in Westbury, NY, the founding of the local children's library pre-dates the establishment of the local public library for adults. The 2 only joined together since 1965. History of Westbury Children's Library


Revised 8/19/2014 to correct a wretched typo: Dana was born in 1856, not 1956!

Wednesday, August 13, 2014

Getting more familiar with Python Classes and Objects.


If you've been following this blog, you're aware that I feel unprepared to make good use of the Object Oriented Programming facilities of the Python programming language. Python is kind enough to allow programming in non-OOP styles, but Python's OOP features keep glaring at me, reminding me of what I don't comfortably know.


I posted a question on Quora.com asking for real world examples of multiple inheritance being used in Python programs. I was disappointed about how few answers came back. That dearth of response left me with the suspicion that I'm not the only person around who isn't completely comfortable with multiple inheritance. http://www.quora.com/What-is-a-real-world-example-of-the-use-of-multiple-inheritance-in-Python. Looking around at other multiple inheritance questions on Quora (http://www.quora.com/Why-is-there-multiple-inheritance-in-C++-but-not-in-Java), I see some reason to suspect that super serious application of OOP is just going to need more time to sink into the heads of the majority of developers. So, I'll continue to watch and learn, but will try to remember to adhere to the KISS principle to the greatest extent possible.


Additional Lessons


  1. This 40 minute video from Ray Hettinger (The Art of Subclassing) explains how Python classes differ from classes in other languages. Ray tries to reshape people's thinking here, so if you aren't already deeply steeped in OOP lore, you may feel he's dwelling on the obvious. He may give you some ideas of appropriate uses of inheritance in Python objects.


  2. Ray mentions this 2nd talk in his video. This 2nd talk was the next talk after his at Pycon 2012. "Stop Writing Classes", Jack Diederich, 28 minutes. Basically, that video asserts that my own example so far of writing a class for a Python program is not very good. The clue: My example class had only __init__ and one other method. I could have written it as a simple function and used the partial function from the library functools module to handle the initialization.


Further Reading


I have 3 previous blog articles on OOP in Python.


In Creeping up on OOP in Python - Part 1 I described a use of an object-oriented library module, pyparsing, to solve a Project Euler problem.


In Creeping up on OOP in Python - Part 2 I reworked my solution to add a simple class of my own. I was happy that introducing that class made the code cleaner to read. But if you watched the "Stop Writing Classes" video given up above in this blog article, you'll probably notice that my class is an example of exactly what they say you shouldn't do.What can I say? I'm still learning this stuff.


The 3rd in my Creeping up on OOP in Python" series was a bit different from the first 2. It explored an academic question about multiple inheritance. It is exactly the kind of A, B, C example that Ray mentions avoiding in his talk. Creeping up on OOP in Python - Part 3. I haven't forgotten my promise of a Part 4 as soon as I have a practical example of multiple inheritance to substitute for A, B, C and D in that academic question. But so far, there is no Part 4.


Ray mentions "the gang of 4". If you aren't familiar with them, here's a reference for you to pursue: http://en.wikipedia.org/wiki/Design_Patterns. And he mentions "Uncle Bob". I also mentioned Uncle Bob, with some links here: SOLID software design.


Know more about this OOP stuff then I do? Well, don't keep the info to yourself. Please post a comment with a link or example to help me learn more. Have you found a particularly relevant MOOC that you'd suggest?