Saturday, January 5, 2013

Judging Code Quality

Unit 3 of CS101 introduces functions. One of the more challenging homework problems at the end of that unit was problem 3-8. Given a purported solution to a sudoku puzzle, represented as a list of lists, the assignment is to write a function that determines if the offered solution is valid. The function is to return True if the puzzle solution is valid, and return False otherwise.

One of the students was less than delighted with his solution. His function worked, but he wondered if it was Good Code or Bad Code? Some folks responded that the offered routine was too long, and that made it Bad Code. To see the original question and answer thread, follow this link:

I tried to give a more nuanced answer to the opinion question...

Good code is largely a matter of opinion, which is not to say I don’t have opinions. John’s point about the length of the procedure is a valid point. I’d also argue that clarity could be improved by better choice of variable names, or at least comments explaining what each variable represents.

len(length) set off alarm bells in my head. Looking at the routine, I gather “length” is the list of lists that makes up the sudoku puzzle. Again, a better variable name or a comment would make things easier to read.

In the end, the acid test is “does your program work?”. If I can make it fail with a test case, surely, you’d have to accept that the code has a problem. The ease or difficulty of correcting things is then a valid basis for judging if it’s Good Code or not. That isinstance test to screen out floating point numbers in the data worries me. What does it do if you hand it a different data type, like, say, a complex number? If you eventually arrive at a program that handles every test case I can think of, but the code bristles with odd tests to avoid problem X, Y and Z that we discovered, that still is probably not Good Code.

My version is already posted to the forum with some other sudoku question, so I won’t repost it here. My approach was to generate a list of all the integers that would be valid in a row. (e.g. for a 9x9 puzzle, any valid value should be in the list [1 2 3 4 5 6 7 8 9]. As I matched a value to the list, I’d delete that value from the list so if it came up again in the same row or same column, I’d reject the duplicate as invalid. For each row and each column, I freshly repopulate the list of valid integers.

One aspect of my code that I wasn’t completely happy with was that my code for checking of rows largely duplicated the code for checking columns. “Don’t Repeat Yourself” (DRY) is good guidance for writing Good Code.

One other area where I think my code could use improvement is I had no explicit test to make sure the puzzle data was square (length of each row exactly matching the number of rows). Would be easy to patch that in. is my collection of links to web pages pertaining to the topic of writing good code. The “Principles of Good Programming” pearl there is probably a good place to start reading.

If the wit and wisdom of the xkcd comic strip appeals to you, you might enjoy: If you’ve read this far and are enjoying my answer, you might try a Google search for:

code smell

to find tons more to read to help you better discern the answer to the code evaluation variant of the question posed to Dorothy soon after she arrived in the land of Oz: “Are you a Good Witch or a Bad Witch?”

And if I still haven’t convinced you that your code could be better: Consider how terse the results from your code are. Suppose your code became widely popular and you get a support call from some distant timezone at an inconvenient hour. The user says “I asked your program if my sudoku puzzle was valid and it said False. What’s wrong?”. Good Code would have precluded the need for that support call, or at least have told the user exactly where to look to find the problem. They might call for support anyway, but such is life.

Want to enrich your book shelf? Well, an Amazon search for books with Brian Kernighan as an author is a good start. The Elements of Programming Style by Kernighan and Plauger and The Practice of Programming by Kernighan and Pike are ones I highly recommend, though neither is specific to Python. Alas, putting the books on your shelf without reading them doesn’t do much good. (Full disclosure: I had the enormous pleasure of working with Kernighan and Pike a while ago). NB. there’s no shame in picking up a used book to save some money. Someone else’s highlighting and marginal notes might even add value.

Best wishes to you on your journey to learning “How to Write Good”. Exposing your code for review by others is generally a Great Idea. Pay attention to the comments and suggestions that come back, even the comments that you don’t like. The comments and suggestions may turn out to be Bad, but do take your time maturely considering the feedback before you arrive at that conclusion.


Since posting that answer, I've established a Pearltree about "Code Reviews". I believe code reviews are the best way to check for quality of the code that goes into a project. Sadly, management of developers often is unwilling to invest the resources needed to take the time to review the code their team is grinding out. An article in that Pearltree that I especially enjoyed is this lament over the sorry level of code quality that is so typical in the commercial world.

I confess that, so far as I know, no one else has expressed serious code-review suggestions regarding my sudoku routine. Specifically, nobody ever pointed out that my procedure suffered the same code-smell of being too long for one Python procedure. And, sadly, nobody has explained to me how I can cleanly reuse the same code to check rows and check columns. Transposing the matrix data seems like too much work, so that isn't the solution to my DRY worries. Please share in the "comments" here if you see a way to eliminate the repetition or have other suggestions for improving my version of how to solve this homework problem.