Laziness Part Two : The 6,000 Line Hashtable

The Problem

Recently, I was called in to review the design for a major piece of functionality in a soon-to-be-shipping product. What began as an inspection eventually grew into a complete rewrite. The most glaringly awful code dealt with storing key/value properties on disk. The code was so overgrown that we couldn’t even begin to consider replacing it for this release. The property code was on the periphery of my original assignment, but it’s sheer size compelled me to take a look.

The property code was amazing! It allowed for categories, so that different sets of properties were guaranteed not to collide. Property bags could be intelligently merged and combined. Each property was typed, with massive amounts of type checking and assertions galore. The properties themselves were persisted on disk using multiple file formats simultaneously. The writable version was stored using sqlite while the readonly version used a custom format complete with a lex/yacc parser.

The implementation was 6,000 lines of code scattered across a dozen files. Nobody wanted to change the code because it was so inscrutable. Most of the features had never been tested. The property code was a ticking time bomb, but it was too late to make any changes for the current release.

It’s Just a Hashtable

This mess could easily be replaced with a string to string hashtable. Hashtables are simple to implement, use, and test, even in a washed up language like C. Hashtables can easily be persisted to text, XML, URL query strings, RPC calls, and databases. Just to be clear:

  • Instead of schema checking, use plain old strings. Convert the values to the appropriate type as necessary.
  • To merge hashtables, add a putAll() method that adds one hashtable to another.
  • To avoid collisions, prepend a category string to each key. You can put the prefix functionality into a subclass.
  • Add read/write methods to persist your hashtable.

This situation would be comical if I hadn’t seen it so often at so many different companies. People inevitably seem to think that hashtables aren’t efficient or type safe enough for their specific needs. I’ve implemented nearly identical hashtable wrappers in C, C++, Java, Ruby, Perl, and Python. I’ve run it on Linux and Windows, on ARM, MIPS, and x86. I feel a warm sense of security each time I write those familiar lines because IT’S JUST A HASHTABLE.

Be Lazy

The lazy programmer uses hashtables wherever possible. If you think you can do better than a hashtable, think twice. If you think you need type safety for property files, please reconsider. If you think hashtables take too much memory or are too slow, you’re probably wrong.

Be lazy and use hashtables. You can always churn out those 6,000 lines later if the need arises.

One comment to “Laziness Part Two : The 6,000 Line Hashtable”

  1. Comment by Ken:

    Oh sure–keep kicking C–”washed up”? What is Ruby written in? I’m pretty sure it’s not Java, Perl, or Python.