Sunday, July 28, 2013

Pythonic Python - Writing Python code that fits the language's idioms.

A recent posting by +Luke Plant to the Google+ Python community about the importance of PEP-8 reminded me that for quite a while now I've been thinking I should compose a blog posting about idiomatic Python code - and that's what led to this article.

Now I know this is a very small candle trying to light a very large dark space, but so it goes. My own Python experience has so far been limited to small projects and class-work, no larger scale production quality team efforts. The good news is that that limitation in my own experience means there's plenty of room for you to chime in with opinions of your own on this matter. I write posts to my blog in hopes of sparking up some conversations. Do feel free to speak up!

Some Orientation if you're New Around Python

If you are new around Python and aren't entirely clear on what it means when we talk about "the Python community", I offer you this orientation section lest you be confused by technical details in my words here. If any of it remains unclear, do speak up about that and I'll try to expand the material to cover more.

A "PEP" is a "Python Enhancement proposal". There's a web site where all the PEP's are published and shared. PEP 0 is an index to the PEP's. PEP 1 is a guide to how to write and submit a PEP.

Most PEP's are proposals to enhance the Python programming language in some way.
e.g. PEP 435 proposes adding an enum data type to the Python standard library. It's been a long discussed topic. PEP 354 for example proposed something similar but was rejected in 2005. PEP 435 has been accepted and is scheduled to be implemented and released as a feature of Python 3.4.

Other "PEP's" are merely informational, not really enhancement proposals at all.
For example, PEP 429 is the nominal schedule and plans for the Python 3.4 release. So, if PEP's aren't always "Enhancement proposals", what are they? PEP's are public records of consensus opinions of the Python community. Not every voice counts the same. Benevolent Dictator for Life (BDFL) and Creator of the Python language in the first place, Guido Van Rossum, has extraordinary influence on the fate of a PEP. Fortunately, he generally shows good judgement in steering the language.

Among the informational PEP's, PEP-8 is a style guide for Python code. If you don't want your code to look strange to the eyes of experienced Python programmers, you should try to comply with the guidance of PEP-8. To help you do that, there's a PEP-8 checking program. Actually, if you do a Google search, you'll find that there's more than one such PEP-8 checking program available. You should have at least one of these programs in your tool box and make a habit of running it against your code and spending the time to tidy things up to make the checker program happy.

It's not that you can't ever bend the rules, but you darn well better have an excellent reason why deviating from the style guidelines was worthwhile in the exceptional case that you decided not to fix.

Another informational PEP that you should read is PEP-20, the Zen of Python. Unlike PEP-8, the guidance of PEP-20 isn't something so simple that a straight-forward program can look at your code and say whether or not you were thinking like a Python programmer when you wrote your code. PEP-20 does try to get your head into the right way of looking at the things that you wish to program using Python.

Beyond Style

There's more to writing "Pythonic" Python code than following PEP-8's style guidelines and letting PEP-20 shape your thinking. For example, given a collection of things (a "list" being the most typical way of setting that up in a Python program, if your reflex to process that list is to think "DO I=1 TO N"... then you are probably still under the influence of Fortran or C or some other such old programming language. A more Pythonic way is "FOR ITEM IN LIST:"... Now there are lots of special situations that might drive you to explicitly stepping an index through a list, even in Python, but don't do it as a simple automatic habit. Ned Batchelder has written an excellent tutorial on how to "loop like a native".

One other area where I've found I have to fight against habits formed in working in other older programming languages is in conceiving of the types of return values from a function. In some old languages, the return values were limited to something simple, and to exactly one thing. But Python is entirely dynamic, so you can feel free to inexpensively return elaborate data structures or even multiple values (tuples) at a time. It isn't inordinately expensive because the implementation doesn't copy the values around, just pass descriptors of the values. The language's runtime garbage collector takes care of reclaiming the storage space when you no longer need the fancy value that you constructed.

Additional Tools

It is important to understand that Python does incredibly little error checking at "compile" time. Your code will pass without complaint from the Python language processor, even if your program has grossly mis-spelled variable names or calls non-existent functions. You won't hear about the errors in a given line of code until you actually try to execute that line. This makes testing your code incredibly important. You should definitely look into Python unit-test tools so you can embed test cases with your code and be well prepared to routinely re-test after you make revisions to the code. Related to testing, you might also benefit from coverage tools that report which portions of your code remain unexercised. This may guide you into beefing up your unit test cases. Sadly, even 100% test coverage of all the lines of code still can't guarantee that your code has no undiscovered bugs lurking in it. But if you haven't even exercised all the lines of code it is easy to anticipate that there may be easy-to-find errors lying in wait to spring out at you at some inopportune time.

There's no real substitute for good judgement and as the old adage explains good judgement is something you learn from experience and experience is often something you gain from applying bad judgement. Another tool that may draw your attention to places in need of better judgement is sloccount. sloccount tells you how many lines of non-blank, non-comment source code you've written and while it is counting, it computes complexity metrics for your functions. A function that is oversized or that scores as exceptionally complex deserves to be re-considered. More often than not, those are the kinds of functions where your undiscovered bugs are lurking.

There are multiple static checking programs that try to find the more obvious kinds of problems for you. pylint, pyflakes, and pychecker to start you off with 3 names to Google search for. Some of the suggestions from these tools can be very annoying, like if I need a short-lived integer variable for local use, why shouldn't I name that variable "i"? But generally the checkers are tunable to tailor the rules to your taste. Don't just get annoyed with what the checker program tells you. Look at what it thinks it sees and see if you could do better and make it happy while you are at it.

Data Types.

One other aspect of the Python language that you should pay attention to is it's richness in data types. strings, lists, dictionaries, sets, tuples... And those are just "collections" of values. If you find yourself frequently searching through a list, stop and think whether a list is really the right choice. Maybe you really ought to use a dictionary to make it simpler to check if a given value is in the collection.

Modules and Name Spaces

If your Python programs all tend to each live in exactly one file, then odds are you aren't making use of Python's modules and name space capabilities to separate your code into manageable sized pieces. Ultimately, that will limit you in your ability to tackle larger projects such as multiple-person programming teams. Keep an eye out for possible reusable modules that are worth separating from the specific problem at hand so you can use the same code elsewhere in the future. Python is quite liberal in its handling of type ("duck typing"). You can exploit that to make your code quite flexible about what data it is willing to deal with. Take the time to carefully document what the requirements are for the data that your module can handle. e.g. maybe you had in mind that it would handle "employees", but perhaps it could be equally happy with any kind of object that has a mailing address as part of it ("customers" for instance).

Multiple paradigms.

Although Python is not a gigantic language with the sort of sprawl that PL/I was notorious for, Python does allow for more than one programming style. It certainly has support for object oriented programming as well as structured programming. Happily, it doesn't insist that you make use of all of its possibilities, but if you have been shying away from some aspect of the language because it supports a style that you are unaccustomed to, do push yourself toward learning how to use that aspect of the language appropriately. "generators" are a kind of co-routine and you may not have run into such a control structure in other languages, but they are worth the time to learn. Object oriented programming is still a weak area for me, but I've been working on trying to understand how to put that to good use. Test driven development is new to me too, but again I've been trying to regroove my mind to pay attention to doing things that way.

Libraries and Frameworks

One of the mixed blessings of working in Python is that there are many rich libraries of existing code available for your use. Do plan to spend some time searching to find what is available that may be helpful to you. The code is generally free for you to download, but you may need to invest some time to understand it and bend it to your will. Forking it to make a specific-to-you version is probably a bad idea. But reinventing the wheel on your own is probably an even worse use of your time. If the library module isn't going to comfortably fit into your program, that may be nature's way of telling you to return to your favorite search engine to find another alternative implementation to use instead of your initial "find".

Learning the Python language is just a start. Learning to put it to good use is a much taller order. As Peter Norvig promises in his essay, "Yes, you can learn to program in only 10 years". But don't get discouraged. Learning new stuff every day can be great fun.

11/10/2013 - Corrected a typo. "are are". Being your own editor has its hazards.

Saturday, July 27, 2013

MOOC's: Be careful what you wish for.

There's an old warning that you should be careful what you wish for, because you just might get it. Here's an article from the Slate "web magazine" cautioning that MOOC's are going to doom us all:

http://www.slate.com/articles/technology/future_tense/2013/07/moocs_could_be_disastrous_for_students_and_professors.html

Countering that point of view, the article does manage to provide a link to a TED talk by coursera founder Daphne Koller, speaking at Edinburgh in June 2012:

http://www.ted.com/talks/daphne_koller_what_we_re_learning_from_online_education.html (21 minutes)

Now my own experience is limited. I took Udacity's CS101 course last year. It has no fixed schedule and I worked on the course when time was available and did eventually complete it with 100% on the final exam. I've signed up for another couple of Udacity courses, but haven't managed to get around to finishing either. Home Internet outages and a few weeks hospitalized distracted me from sticking to it. Of course, with no fixed schedule, I can always return to those courses and get down to work on them.

Koller and coursera seem to put much more emphasis on a fixed schedule than does Udacity. For example, I've been taking coursera's Systematic Program Design Course and have to concede that I have not kept up with it's mandatory schedule of weekly homeworks and quizzes. Realistically, I have to concede I'll not be completing that course this summer and will have to try it again the next time it is scheduled. Not a bad course, but it does get tedious in its attention to microscopic details. I also got distracted with learning a new programming language (Racket) along the way. Too bad as the course was to have 2 peer reviewed assignments and they were to be my first experience with peer reviewed work.

I think the gloom and doom suggestions of the Slate article are a bit too Luddite a point-of-view for my taste. I think MOOC's will lead to a lot of change in education courses in the future. The MOOC courses will establish baselines which future improvements will have to beat. Koller emphasizes that the MOOC offerings collect a lot of data which provide the prospect of guiding future improvements. The one thing that I can see screwing up a bright future of continuous course improvement is if copyrights are used to restrict building a better course based on an existing course. U.S. copyright law has seemed to get repeatedly stretched with additional years to make sure Mickey Mouse, et al, never pass into the public domain. I can only hope that the MOOC courses stick to the tradition of Creative Commons licensing so that it is possible to found an improved course on an existing good course.

On a related topic, although Bill Gates is not a person that I often have praise for, in this 10-minute TED talk by Bill, he does a nice job of arguing that teachers need more feedback. A modest bit of technology, such as a video camera on a tripod, can give teachers a basis for self-assessment and the possibility for peer reviews.

http://www.ted.com/talks/bill_gates_teachers_need_real_feedback.html

Tuesday, July 16, 2013

FIeld Trip and an Unusual Looking Flag

Flag image by Blas Delgado Ortiz, 27 June 2001

A few weeks ago, while we were visiting my brother-in-law in a nursing home in Queens, we had occasion to stop by the NYPD 110th precinct station house to file a report about an incident. While my wife was chatting with the officer at the desk, I wandered around the lobby looking at the various posters and artifacts on display there. I noticed that in several of the pictures of "events", there was a distinctive flag on display - green & white stripes and 24 stars in a curious looking constellation arrangement, not simple rows or anything simply symmetric. I asked the officer after my wife had finished her business what that flag was and she waved me off, not knowing the answer. But the question stewed in the back of my mind, unsettled.

Today, I decided to try a Google search for:

flag green and white 24 stars

to see what it would find. The search easily turned up exactly the information I was looking for. It's the official flag of the NYC police department. 5 stripes for the 5 boroughs of the city (Manhattan, Brooklyn, Queens, the Bronx and Staten Island) and the 24 stars represent the 3 cities, 9 towns and 12 incorporated villages that were integrated together in forming NYC in 1898. A detailed explanation is given on http://www.crwflags.com/fotw/flags/us-nycp.html. So now I know. Can't say that I'm impressed with the depth of knowledge displayed by the officer at the station house desk.

I can't share this kind of information without mentioning the "Fun with Flags" episode of "The Big Bang Theory". Enjoy!

Sunday, July 7, 2013

Recursion

One of the interesting computer science topics that Udacity CS101 introduces is "recursion". Recursion is where you define a function in terms that in some cases require the function to call itself. This is the main topic of Unit 6 of the course. There are important design considerations to be taken into account if you are going to use "recursion" in your implementation of a function.

  1. Base case(s) - It is crucial that your function have at least one combination of inputs that do not trigger yet another recursion. This non-recursive case is called the base case of your function. If you don't have at least one base case, then you are fairly certain to have an unending loop that never produces a final result.
  2. Progress - It is similarly crucial that your function make progress toward the base case(s) as it recurses. If you have situations where sometimes the function re-invokes itself with the same inputs and state as it had previously, then it may be more subtle, but almost surely you are stuck in an unending loop that never produces a final result.

Not every programming language supports recursion. Cobol and Fortran for example traditionally do not. Some languages (e.g. PL/I) support it, but only if you declare that a specific function may be invoked recursively. ("PROC OPTIONS(RECURSIVE)"). Python supports recursion without any need to declare your intent to use the capability, but Python's support of recursion does not include "tail recursion optimization". Tail recursion optimization is where a language processor recognizes the special case that a procedure is being called recursively, but that when the procedure returns to the point of the call, there's nothing more to do than to return to an earlier call to this procedure. A clever compiler can transform the code for such a program to do a plain loop instead of a recursive call. Alas, in Python, there's much that cannot be known for certain about the code until run time. Guido Von Rossum, the creator of Python and it's "Benevolent Dictator for Life" (BDFL) has blogged about why Python doesn't bother to try harder for this particular case. See: Tail Recursion Elimination.

A key fact to note is that if you've got code with a tail recursion in it, then it is reasonably straight forward for you to restructure that code to explicitly use a loop in place of the recursion. It apparently is in this fact that Guido draws enough comfort to not bother trying to optimize the handling of this kind of code.

Some folks look at the limitation of Python's support of recursion and wrongly conclude that recursion is a feature of the Python programming language that you should avoid. Ned Batchelder did a great job of de-bunking that assertion in his essay: Recursive Dogma.

There are lots of interesting discussions of recursion in the Udacity CS101 forum. Much of the debate is over whether or not recursion is something easy or hard to get your brain wrapped around. Some folks find recursion is an elegant way to express a function while others opine that iteration is a more natural way to conceive of a function's processing. My opinion in this debate is that even if recursion provides a straightforward clean design for a function, do think through how to transform that design into an iteration. Compare the readability, performance and limitations of the 2 alternative designs and pick the alternative that makes the most sense for your needs.