Thursday, January 3, 2013

Dictionary Implementation

In Udacity CS101, the unifying thread of the course is construction of a web crawler and search engine. Without making much of a point of it, the initial implementation involves a simple list and linear search. Then, the scaling problem of that approach is pointed out, and the course explains hash tables as a more time efficient way to store searchable data. After we'd rolled some of our own hash tables, the course goes on to introduce Python's dictionary data type - Hash tables as an inherent part of the language.

One of the students asked in the forum how dictionaries were actually implemented. How does Python know how many buckets to allocate for a dictionary? To see the original question see: http://forums.udacity.com/questions/2015065/dictionary-and-hash-table. I looked into the matter and posted this reply:

There's a book from O'Reilly called "Beautiful Code". Chapter 18 of that book describes the Python dict implementation in detail and the web has positive things to say about that chapter. Alas, the book is not freely available, unless you have access to a good technical library from which you can borrow the book. My local library isn't particularly technical, but they can do interlibrary loans from distant university libraries to get what I need, so long as I'm patient. Ask your librarian for help.

Or, if you are feeling ambitious, take advantage of the fact that Python is open source, so you can look at its innards to your heart's content. Python dictionary implementation (in C). As George Lucas had Obiwan suggest: Use the source, Luke.

The most common Python implementation is Cpython, an interpreter implemented in the C language. But there's a somewhat more experimental implementation called Pypy, an interpreter implemented in Python. I speculated that with a little rummaging around, I ought to be able to find Pypy's implementation of dictionary's, but close as I found was some documentation on Pypy's dictionary optimizations. There are tons of bitbucket.org/pypy pages for Pypy source, but I didn't actually find the Pypy code that you are asking about. Feel free to show me up on locating the routine of interest to you.