I've been working my way through Allen Downey's Think Complexity book. I'm not very far in, but so far it's a great way to brush up on algorithms and datastructures, and learn some new stuff about complexity science. Plus, all of the code is written in Python! I've been doing the exercises, and most of them take at most fifteen or twenty minutes. But at the end of his explanation on hash tables he casually lobs this one out (3.4, exercise 5):

A drawback of hashtables is that the elements have to be hashable, which usually means they have to be immutable. That’s why, in Python, you can use tuples but not lists as keys in a dictionary. An alternative is to use a tree-based map.

Write an implementation of the map interface called TreeMap that uses a red-black tree to perform add and get in log time.

I've never researched red-black trees before, but as a C++ programmer I know they are the datastructure that powers std::set and std::map. So I decided to take a look. I quickly realized this was not going to be a simple exercise, as red-black trees are quite complicated to understand and implement. They are basically binary search trees that do a lot of work to keep themselves approximately balanced.

I spent a few nights reading up on red-black trees. A good explanation can be found in Wikipedia. There are even a handful of Python implementations floating about, of varying quality. But finally I found a detailed explanation that really clicked with me at Eternally Confuzzled. Julienne Walker derives a unique algorithm based upon the rules for red-black trees, and the implementation code is non-recursive and top-down. Most of the other implementations I found on the web seem to be based on the textbook Introduction To Algorithms, and often involve the use of parent pointers and using dummy nodes to represent the nil leaves of the tree. Julienne's solution avoided these things and seemed a bit less complex. However the best reason to study the tutorial was the explanation was very coherent and detailed. The other sources on the web seemed to be fragmented, missing details, and lacking in explanation.

So to complete my Think Complexity exercise, I decided to port Julienne's red-black tree algorithm from C to Python, and hopefully learn something along the way. After a couple nights of work, and one very embarrassing bug, I've completed it. I can't say I quite understand every bit of the algorithm, but I certainly learned a lot. You can view the source code at Bitbucket, or clone my Think Complexity repository.

Many thanks to Julienne Walker for the great tutorial! And I highly recommend Think Complexity. Check them both out.


comments powered by Disqus