Sunday, April 14, 2013

Creeping up on OOP in Python, Part 3.

In Part 1 of this series, I looked at a solution to Project Euler problem 22 using the object-based pyparsing library. In Part 2 of this series I refined that code by introducing a simple object of my own creation. The change had the salulatory effect of moving the name-scoring initialization into a logical spot instead of leaving it up in the main routine.

Today, in Part 3, I look at a classroom assignment about multiple inheritance. On the one hand, working through this assignment did leave me with a new and better understanding of multiple inheritance. But, I confess to more than a little discomfort with this sort of assignment, dealing in obviously artificial A,B,C,D classes. What the assignment doesn't do is provide me with any insight as to why I might want as complex a class hierarchy as the A, B, C and D of this assignment. Perhaps I need to spend some time with some OOP text books to find practical examples of this sort of complexity. If someday the Aha! light clicks "on" for this matter for me, I promise to return with a Part 4 that has some less synthetic examples of multiple inheritance. Meanwhile, I confess that I've been looking at the Go programming language and it seems to have a much more obvious mechanism for "inheritance". But, not having written line 1 of Golang code, I'll not expound on that opinion here at this time.

The original problem posed is here: It's just the kind of homework problem that I hate. It deals in a complex hierarchy of classes A, B, C and D, but does nothing to help me understand why a program might want such a hierarchy. I'm not exactly sure where the problem was posed, but I'll guess it was from the MIT 6.00x MOOC. At the point where my friend asked for help, he already had the correct answers and was concerned that he couldn't work out how they got those answers. On my first try I too came up with incorrect answers, so I had to dig a little deeper.

If you know something of Python multiple inheritance, now would be a good time to try to work out your answers to the questions in that assignment.

If you've read Part 1 and Part 2 of this series, you know I've had very little experience with classes and objects in Python, and have never dabbled at all in multiple inheritance. What I know is that inheritance is a way to make a specialized subclass of a class. e.g. suppose you have an employee object. Employee's have an ID number, a title, a location/room, office phone and so forth. Now suppose you want to define a manager object. A manager is an employee, but perhaps has an associated list of directly reporting employees (the folks he manages). So, to save having to re-implement all the basic stuff for the employee aspect of the manager, you instead derive the class for manager from the class for employee.

Fine. Where I get confused is when looking for good examples of multiple inheritance. Sitting here trying to make one up: Suppose we had a Stash class that knew how to pickle an object and write it to nonvolatile storage, returning a key value that you could later present to a method of Stash to retrieve the stored object. A "permanent" employee might usefully be derived from both the employee class and the stash class. But why multiple inheritance instead of deriving employee from stash and then manager from employee? The crystal grows dark. Perhaps if I get around to Part 4, I'll have good answers. If you can give me a clue that will further my education on this topic, do feel free to add a comment to this blog or if you'd like more room, post a link to a blog response of your own.

Back to A, B, C and D of the homework problem. The language is Python 2.7. That's potentially important because the method resolution order (MRO) changed in the move from Python 2.2 to Python 2.3. On, all the hairy details are explained and python code is given to inspect a class hierarchy. So I cross-bred that print_mro code with the original assignment and massaged the lines so the file was executable Python code. That gave me this: My thinking was that by actually executing the code from the questions, I could see for certain if the official correct answers do come out, and then I can explore the test code to understand why those answers came out.

My first big surprise is that as given, print_mro only works on classes, not on objects. As I understand it, an object in python knows what class it is an instance of, so I believe that print_mro could be readily souped up to start from an object and report on the class MRO anyhow. I didn't bother - just lived with the limitation of only handing classes, not objects to the print_mro routine.

As the code now stands, this works:

obj = D()
print "D.__dict__=", D.__dict__

but print_mro(obj) fails.

A class has a set of built-in class attributes. CLASS.__dict__ is a dictionary of all those class attributes. See: For all the foofaraw about MRO's the print_mro routine showed no surprises. The general rule is to find a method, start at the root of the hierarchy and work down and left to right. First match to the sought method name wins.

So the class attributes include __bases__, the parent classes of the class.

An object also has a set of built-in attributes: OBJECT.__class__ is the name of the object's class, and OBJECT.__dict__ is a dictionary of the object's attributes (just data attributes, like a, b, c, d in our example). Notice that __bases__ isn't lugged along with each instance of the class, just the class name and you'd have to look at the class's built in attributes for further info.

The output of running is here: Lines 76-77 show the linear list of the classes that make up class D. I see no surprise there. It's D, C, B, A. So when looking for a method (i.e. function) to invoke from an instance of D, that's the order of the classes that it'll look through in questing for that method.

But by peeking at the obj.__dict__ I see only simple values in that instance. e.g. {'a': 2, 'c': 5, 'b': 3, 'd': 6}, not the complicated class hierarchy I was expecting to see reflected in the data too. I was expecting to see an "a" attribute associated with class B and another from class A. But it seems the only "a" is for obj's class D. So the __init__ methods get invoked when obj is instantiated and the last to run is the one for class A, so the value of the "a" attribute set by B.__init__() is the one that prevails. A.__init__ sets "a" to 1, but it was invoked by B.__init__ and later in B.__init__, the "a" gets set to 2.

So that's quite different from what I expected to see, but perhaps is simpler than I was looking for. In any case, it does explain why obj.a ends up being 2.

Looking a bit further up in the hierarchy, D.__init__ invokes C.__init__ which sets "a" to 4, but next thing D.__init__ does is invoke B.__init__ so the "a" changes to 1 and then to 2 as I just described.

All those complex resolution order rules apply for find a method to apply to an object, but don't apply at all for a reference to a data attribute. If you are looking for an attribute named "a" the object's dictionary either has an entry for "a" or you are out of luck.

I still believe it'd be much more valuable if a real use for this stuff was contrived for the examples here instead of A, B, C and D. I haven't done any significant OOP, so I have no real experience with inheritance. Maybe if I had such experience, the merits of the Python implementation of multiple inheritance would be more obvious to me. Seems to me that having all the data attributes in one dictionary implies that in modifying any of the code of any of the classes in the complex hierarchy that a certain amount of care is needed to not introduce unintended name conflicts across the classes. My best guess is that someday when working a far more complex problem in Python that I'll suddenly recognize a burning need for multiple inheritance, but until then, I intend to follow the KISS principle.

If you know more about multiple inheritance in Python then I do (setting a very low bar...), please point out any errors in this article. I'm just sharing what I saw in my experiments, so there's plenty of room for me to get some or all of it outright wrong.