Wednesday, May 29, 2013

Cheap, but not free. Modular code has a price...

Introduction and Spoiler Alert

Recently, Nico Ekkart asked on the Google+ Python Community for suggestions on how to speed up his Python implementation of Project Euler Project 14. I remembered my initial implementation of Project 14 had taken way more than the allowed minute of processing on my laptop and that I'd souped it up by adding a cache to get the run time down. I dusted off the files and compared the performance to Nico's. Sadly, his outperformed mine.

To see the requirements for Project 14, see: http://projecteuler.net/problem=14.

Spoiler alert: If you would rather solve Problem 14 on your own, don't follow the links below. In fact, you should read no further in this article, lest your thoughts be polluted with prior art. To see Nico's question and the associated discussion thread, see: https://plus.google.com/113176807122785773311/posts/MPA7LmLZdVW.

The source code for Nico's implementation is here: https://gist.github.com/rdrewd/7654766.

As I explained in Creeping up on OOP in Python - Part 1, all my Project Euler programs have a wrapper main program to time and report the timing of the run and to print the answer. The source code for that wrapper script is here: https://gist.github.com/rdrewd/7654590.

The source code for my initial uncached implementation is here: https://gist.github.com/rdrewd/7654699.

The source code for my cached implementation is here: https://gist.github.com/rdrewd/7654742.

I ran each of these programs using Cygwin Python 2.7.3. The log of the runs is here: https://gist.github.com/rdrewd/7654787 and includes one additional run. The difference was that I tweeked the cache size up from 100,000 to 1,000,000 entries to see if it would improve my run time. It did, but my run time still was behind that of Nico's code.

So what's the difference between my code and Nico's code? Well, a crucial non-difference is that both my cached version and Nico's code use a cache ("memoization") to save having to recalculate results that are already known. Far as I can see, a crucial difference is that I used a Python generator function to run through the specified sequence, and I used a simple class to do the caching. Nico's code is more in-line, mostly a simple while loop to compute the sequences, so there are many fewer calls going on. Nico's code uses one statement to do the caching where I used a simple object to implement the stashing of values in to the cache with a "method" (stash) and another method (value) to check whether a value is already in the cache On the one hand, I like Python's structuring provisions for generating sequences, but on the other hand, given the millions of times this program is invoking the next step in the sequence, the overhead does mound up to noticeable amounts of CPU time. I was also worried that perhaps I didn't have enough real memory to be able to cache all million values, so my cache had a bound on how much it'd cache. Nico's code just trusted in the virtual memory to allow a very large dictionary.

Summary of run times:

Nico's code     6.645 CPU sec
Drew's nocache 177.638 CPU sec
Drew's cache   28.501 CPU sec
Drew's big cache   16.863 CPU sec

Conclusion

The modular control structures in Python are remarkably efficient, but they are not entirely free of overhead. I stopped tuning after I got the run time below the requirement of "no more than a minute", but clearly I was off the pace by a factor of 4 or so. Was the structure of my functions and objects worth that cost? Well, for having met the requirements, I think so. I have to concede that Nico's code was so short that I do find it hard to argue that my function def's and Cache class added any significant amount of clarity to the code. As for Nico's question of how to make his code run faster, the stackoverflow pointers in the original comment thread may help get his run time down even further. Memoization is the big performance win, and thanks to Python dictionary data type, that is darn easy to implement.

Revised 11/26/2013 - My employer blocks pastebin.com, so to facilitate sharing with my co-workers, I've copied the pastebin snippets to gist.github.com and updated the links here accordingly.

Friday, May 17, 2013

Education Automation - R. Buckminster Fuller

A few months ago, in response to one of my "education" tagged blog posts, an old friend from high school contacted me via Facebook and suggested that I really should take a look at the book "Education Automation" by R. Buckminster Fuller, the famous architect. The book was published way back in 1961 and for someone without access to Doc Brown's DeLorean, it had remarkable foresight. He basically predicted the Internet, though he called it 2-way television.

The Book

The book is a slim paperback volume, only 88 pages long. It was put together at a time when Southern Illinois University was undergoing major expansion in 1961 - a transition from being a state teacher's college to being a full-fledged university. Fuller had been invited to give a talk to the committee planning the new campus. The book doesn't quite read like it is a transcript of his talk. I believe it is more like a report by Fuller of what he said.

I have to give credit to the Westbury Public Library for finding a copy of this book for me to read. They borrowed it from the library of Hartwick College way off in Oneonta, NY, about a 200 mile drive upstate from Westbury. I was tickled to notice the 1961 price of the book printed on the book's cover: $1.95. If you convert that into how many gallons of gasoline you could get for $1.95 in 1961, and then figure out what the contemporary price for that many gallons of gasoline would be, I suspect the $1.95 maps to a pretty good approximation of what a slim academic book would cost these days. The years have not been kind to the purchasing power of a dollar.

There are plenty of smells of age to be found in the book such as frequent mention of "men" where a more contemporary book would surely have said "people". And, I was surprised at how high an opinion Fuller seems to have of himself. I expect a tad more humility from great men of the past.

What did he call for?

Fuller foresaw a future in which "earning a living" was no longer required. He expected that change was going to arrive within 10 years of the time he was speaking in 1961. He expected that the only lasting world industry would be education. He anticipated that scholars would stay for years and years at the universities, "for the rest of their lives", "developing more and more knowledge about the whole experience of man".

One somewhat off-the-wall suggestion from Fuller is that a 200 foot diameter (or larger) geodesic dome be suspended 100 feet above the campus and be covered with light bulbs on the entire interior and exterior surface, each bulb under the control of a computer, so the 'Geoscope' would be a gigantic display screen. "It will make possible communication of phenomena that are not at present communicable to man's conceptual understanding." I've seen enough movies to have my doubts about any good coming from a large screen for mass viewing. Brings to mind Apple's 1984 commercial.

He clearly suggested that the university acquire as much land as it can. But forget about any permanent structure anchored to whatever the shape is of today's curriculum. Everything should be dynamic. e.g. bring in a geodesic dome via heavy-lift helicopter to serve as an auditorium whenever you need another big auditorium. Or just build a gigantic half-mile wide geodesic dome to enclose the entire campus and give people large clear spaces to do their work. "Don't waste dollars on great, heavy stone masonry and any kind of Georgian architecture."

What did he get wrong?

From my 21st century perspective, he did get some things wrong. He still seems to expect that education is something that is done by herding large groups of students into giant auditoriums. Current consensus seems to be that the Internet particularly obsoletes the large-to-the-point-of-student-anonymity lectures, replacing them with on-demand viewing of pre-recorded lectures. (e.g. MOOC's).

I wish I knew more about what they actually did there at SIU after the planning sessions. I Google searched for images of the Southern Illinois University campus and only found a few pictures, none of which seem anything like Fuller's suggestions. I'm slightly more familiar with a more recently developed college campus, Liberty University. Though I have never set foot on Liberty University's campus, we received a magazine from them a few months ago detailing their campus expansion projects. From what I can see, Liberty University initially built large warehousey "temporary" buildings, but then in more recent years has been replacing those buildings with new prettier more permanent structures.

Fuller's 1961 advice was to build with the inspiration of circus tents, not wasting money on, e.g., Georgian architectural detailing. From what I can see of the continuing development of Liberty University, they initially erected structures that were slightly more solid then circus tents, but without significant ornamentation, but more recent structures are in a style that would be fitting for colonial Williamsburg, with lots of brick and big decorative columns and with large windows consisting of many small panes. It would seem that Fuller was overlooking the role of campus buildings in inspiring pride in alumni as a stimulus to continuing generous donations. In the book, he made little mention of budget constraints. He also, despite his own involvement in campus athletics (he'd been a football quarterback in his high school years), seemed to pay no attention to the huge investments large schools seem to make in sports facilities.

I noticed that Fuller's advice seemed to pay no attention to matters of infrastructure: e.g. bathrooms and parking and/or provisions for on-campus transportation. Research labs need plumbing, fume hoods, marker boards, bookshelves (at least they still need those today), desks, chairs, network access, electricity, heating, lighting and air conditioning. Much more than just the generous amounts of space Fuller advised should be acquired and enclosed.

He had a curious fixation on a proposal for a large domed display screen that could be used to display images, charts and graphs to help the people on campus to attain a commonality in their point of view. I wonder if Leo Villareal would give any credit to Fuller for Villareal's "Cosmos" installation on the Cornell U. Johnson Art Museum. Villareal's piece makes sense of what had been a seemingly pointless architectural detail of the museum building, which building has been compared to the shape of a gigantic sewing machine.

The foreword of the book says Fuller served as a research professor at SIU. I hope he didn't get overpaid for the advice he gave but that they apparently didn't follow. If you have affiliation with SIU and can say more about what influence Fuller's suggestions had on the development of that campus, I'd sure like to hear from you in the comments.

Maybe Fuller was just way ahead of his time, but I'd have an easier time expecting a virtual campus that is spread around the world and reconfigured as different specialists get involved in a research project. A campus where there's no "there" there? Good luck figuring out where home-game sports events are going to be held. Perhaps the central feature of the physical campus will be a large Internet connected data center/online digital library to facilitate research and online collaboration by the people affiliated with the university, wherever they may be.

Oh, and I bet there'll be a lovely administration building with columns and brick and many window panes and a board room for the meetings of the trustees (who probably are going to lag on acceptance of online meetings). I'm not advocating that all future research be conducted in the researcher's home kitchen. Space (rented?) with suitable labs facilities would be acquired wherever the researcher chose to be. I foresee big demand for lab space in Hawaii. Lovely weather - especially compared to Ithaca, NY, just to name a place where you can find a conventional physical campus today.

Thursday, May 16, 2013

How does Python identify the type?

In the CS101 forum, a student noticed that types significantly influence the computations in python, but that we generally don't declare types in Python. They wondered how this can be so.

In the speed of the light question,when the prof changed the value of 1 to a float value 1.0,the answer was much more accurate,i want to know , how and why in python unlike other languages,there is no need to define the type?

To see the original discussion thread, see:http://forums.udacity.com/questions/2001224/how-does-python-identify-the-type.

My reply to that question:

In Python, variable names have values and values have types. So if I say x=2, the type of the value of x is int. I can then say x="dog" and now the type of the value of x is string. Python knows the type of a literal in the source from the syntax (e.g. quotes to make something be a string value, or square brackets to make something be a list or curly brackets to make something be a dictionary. Number without a decimal point for an integer. Number with a decimal point for a float. At runtime, all the values of variables in memory have a specific type. Operations on those values therefore have methods that know how to work with those types (or an exception is thrown), and produce results of a specific type.

Python might not know what that type is until runtime, but hopefully the programmer knows what they meant to produce and gets it right. Other languages ("bondage and discipline languages") are much more demanding that all be declared at compile time, but then you might end up with a procedure that you'd like to use on an object in your program, but get snagged that although the object behaves like you need it to, it isn't the exact type that the procedure was coded to expect. Python says "if it walks like a duck and it quacks like a duck, then it's a duck" (Hence, "duck typing"). So if the object provides whatever operations are needed for the procedure, you're good to go.

Broad categories of types include orderable types for which lt and eq (comparison) operations are defined. Another broad category is "iterables". You can easily write a procedure in Python that'll happily operate on any type that meets the requirements of the procedure. e.g. if you can compare objects of the given type and decide how to put them into order (i.e. which is "greater" and which is lesser), then you can probably feed objects of that type to your sort procedure - even types that you created long after you wrote the sort procedure. If you write a procedure that takes an iterable, it can be written to be happy with a list or a file or a generator (a special kind of procedure that returns a value but that you can then resume to get the next value) - and the procedure can be happy as you apply it in amazing new ways.

That's one of the definite sources of the power of Python. This type freedom has its downsides too, such as there being surprisingly little that can be statically verified about a piece of Python code. At "compile time", Python doesn't really know what is going to happen at run time. If I was implementing a nuclear reactor control system, I think I'd feel safer writing it in Ada, not Python. But, most days I definitely am not writing nuclear reactor control software or anything that could seriously kill someone if it has a flaw. That sort of software puts a seriously literal meaning into the phrase "fatal flaw". For a reminder of how fatal a software flaw can be, see: An Investigation of the Therac-25 accidents.

Here's a highly technical talk related to this topic of types which I hardily recommend. Have a pad and pencil handy as you watch it and jot down things he mentions that you don't know about, then Google for more info on those words later. Dynamic Languages Strike Back. After you are more comfortable with those words, you might even want to watch the video again.

Tuesday, May 14, 2013

Education's Future: "Go Do Something Interesting".

Another interesting TED talk on the future of education. The speaker is Seth Godin. The question is "What is school for?".

http://www.youtube.com/watch?v=sXpbONjV1Jc (17 minutes)

As usual, it is easy to see how that could work out for a self-motivated student, but what about the unmotivated students? Will they just waste time until school is over? Spending hour after hour playing World of Warcraft doesn't sound like a well rounded education to me.

Life could be awfully complex for a teacher in the future if all the students are going off in their own directions and each student is looking to the teacher during the day to answer questions they have from the diverse lectures they listened to last night. I suppose there can be various specialists on staff, and depending on the questions, the teacher could redirect the student to the appropriate specialist. Except for the lower grade levels, that isn't all that much different from the staff of schools today, except for the change from having assigned time slots to sit with each specialist on your schedule.

Unclear to me is how the school will insist on a "well rounded" education. I suppose this comes down to some assumptions on what school is for. I believe that one of the things school is for is to educate the students to the point where they are prepared to learn new things in the future. That calls for some fundamental reading and writing skills, some basic math skills and some elementary knowledge of history and literature. This may not fit the student's opinion of what is "something interesting", but if the school hands out diplomas to students who aren't prepared even to be knowledgeable voters, then the diploma doesn't mean much. This is where the cynic would point out how little a high school diploma means today.

On the one hand, I can see a case for loosening up a school's schedule, spending less time in required classes, but on the other hand, I can see an awful lot of students frittering away the time released for them to go do something interesting. Do you see a way to structure things such that no one gets left behind and folks who are motivated can go as far as they want to go?

Monday, May 13, 2013

Copied code or Unfortunate coincidence?

My blog post today is another reposting of an item I'd originally submitted to the Udacity CS101 forum. This item isn't at all Python-specific. A student there wanted to know how code similarities are handled in the real business world. How to distinguish copied code from an unfortunate coincidence? If you'd like to see the original discussion thread, see: http://forums.udacity.com/questions/100049998/how-do-people-know-when-code-is-a-copy-or-just-an-unfortunate-coincidence?page=1&focusedAnswerId=100053997#100053997

My answer to the question was this:

There is a distinction between copyright and patent. I'm pretty sure that cleanroom techniques can be used to duplicate a piece of copyrighted code without getting into trouble. That is, the people who have seen the original code don't write the copy, but just pass the requirements and specifications through to people who never saw the original code. But if there is a piece of patented code, and you write your own code that inadvertently infringes on the patent, then you've got potential legal woes even though you didn't even know about the patented code. This is why there is such strong opposition to the notion of software patents. For example, see this article in Forbes or this blog from the Electronic Frontier Foundation.

It is also worth noting that not all copyright holders insist that you not copy their code. The GNU GPL license is a form of copyright that declares you are free to copy the code, but, if you do so, you need to make your modified version of the code freely available under a GPL license itself. The GPL is sufficently different from a typical copyright, that the slang term "copyleft" is often used for the GPL. The "contagious" properties of the GPL are fairly terrifying to companies that regard their code as proprietary secrets. If they inadvertently incorporate some GPL'ed code into their package, they might have a court declare that they have to release all that package's source code under GPL terms. The BSD license is different. As I understand it, BSD licensed code is free to copy without imposing restrictions on what you then do with your adaptation of the code. http://www.freebsdnews.net/2008/07/18/bsd-license-vs-gpl-license/

There are many different software licenses. If you have occasion to reuse a piece of code, be sure you know the license terms. Some licenses are incompatible with some other licenses, so it is possible to pick up code with incompatible licenses and produce a program that cannot legally be released. There are companies (e.g. Black Duck Software) in the business of helping folks track down the origins of their code to make sure it is properly licensed.