Archive for the 'Uncategorized' Category

« Previous EntriesNext Entries »

emacs dotfiles 2007-01-20

Saturday, January 20th, 2007

It’s time for another dotfile release. This release includes some fixes for emacs 22, and a significant improvement in abtags. Download the dotfiles here:

Adam’s Emacs Dotfiles

From the changelog:

2007-01-20
- completion fixes for emacs 22 compat
- changed nxml indent to 2
- mapped html/sgml to nxml mode
- *.rake => ruby mode
- abtags auto-reloads TAGS files now
- finally tracked down and fixed pesky loaddefs issue

Turn Off Rails Sessions for Robots

Monday, January 8th, 2007

Urbanspoon is already attracting a sizable amount of traffic, and we expect our numbers to grow rapidly now that we’ve launched Chicago and New York. Urbanspoon is regularly crawled by a large number of robots seeking to index our site.

Some of our pages squirrel information away inside the Rails session. For example, we keep track of recently visited restaurants so that we can guide users back to those restaurants when they return. This is handy if, for example, you always order pizza from one or two restaurants.

Imagine if Googlebot crawled each of our 35,000 restaurants each day. Each time the bot hits a restaurant we would attempt to record a “restaurant visit” in the session. Since robots generally don’t use cookies, that would create 35,000 useless sessions each day. Wouldn’t it be nice to suppress these sessions entirely?

I wrote a helper function called is_megatron? to detect if a request’s User-Agent indicates that the request is from a robot. The regular expression catches most of the bot traffic that hits our site:

class Util
  def Util.is_megatron?(user_agent)
    user_agent =~ /\b(Baidu|Gigabot|Googlebot|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)\b/i
  end
end

If we determine that a request appears to be from a robot, we simply disable session support for the current request:

class ApplicationController < ActionController::Base
  # turn off sessions if this is a request from a robot
  session :off, :if => proc { |request| Util.is_megatron?(request.user_agent) }

  ...
end

Gupta-Sproull Antialiased Lines

Wednesday, November 29th, 2006

Six months ago, I managed to generate the following table. It’s not remotely relevant to Urbanspoon.

{ 0xc7, 0xc6, 0xc2, 0xbc,
  0xb3, 0xa9, 0x9c, 0x8e,
  0x7f, 0x70, 0x62, 0x54,
  0x46, 0x3a, 0x2f, 0x25,
  0x1c, 0x15, 0x0e, 0x09,
  0x05, 0x03, 0x01, 0x00 }

This is the coverage table for rendering antialiased lines using the Gupta-Sproull algorithm. Apparently, this little table is included in their original paper, first published in 1981. I couldn’t find the coverage table online. I eventually bought a copy of the paper from the ACM for $10, but it didn’t include the damned table!

Finally, I bit the bullet, sat down and read the paper. Then I hacked together a Java class to generate the table. I wasted two full days of my short life on those 24 numbers.

You’ll thank me later.

I spent two weeks trying to efficiently render antialiased lines as part of a consulting project for a consumer electronics company. During those two weeks I was completely obsessed with line rendering. I raved about it to my friends, long after they’d stopped listening. I dreamed of lines. I learned a lot about line rendering, most of which I’ve forgotten.

I eventually implemented Wu lines with a Gupta-Sproull style coverage table. In addition to the body pixel coverage table above, I also calculated an endpoint coverage table. Divide a pixel into a 16×16 grid. Move the fractional line endpoint to the nearest grid location, round the slope to the nearest 1/16, and then perform a lookup in the table. The lookup returns the coverage for the 6 pixels surrounding the endpoint. For those of you following along at home, the table is 16*16*16*6 bytes, or 24k.

The original paper dealt with integer (not fractional) endpoints. The authors suggested that it might be possible to support fractional endpoints by generating a table similar to the one I created, but that it would be too big for practical use. That was in 1981. Here in 2006, even portable consumer electronics devices can spare 24k. The Gupta-Sproull lines were much faster than agg’s lines, and looked just as good.

I came up with a great hack to generate the table. I drew some really fat lines in Java and then manually measured the coverage! A picture is worth a thousand words:

The moral of the story? A mediocre pixel tweaker like me needs to piggyback on 25 years of Moore’s Law in order to implement a basic line rendering algorithm.

Complex SQL Sorts with Rails/ActiveRecord

Saturday, November 18th, 2006

On Urbanspoon, we often need to efficiently sort a subset of records that were retrieved using a different sort order. For example, our Most Popular Restaurants in Seattle page first selects the top 100 restaurants, then allows the user to sort by name or price.

We need to perform a sort, then perform a secondary sort on a subset of the results. Here is an example of a two step sort in action:


unsorted

sort by popularity

sort by price, but
only the first 100

I tried various implementations:

  • Perform the second sort in Ruby. This is inelegant, inefficient, and impractical for expensive sorts like distance in miles.
  • Use a subselect to get the ids in your :conditions. Unfortunately, Mysql doesn’t support LIMIT in subselects. This also breaks sql_calc_found_rows, which we use with some of our complicated sorts.
  • First select the ids, then manually construct the :conditions from the ids.

After several fumbling attempts, I eventually settled on the last approach. This technique requires an additional query, but it let’s you use things like ActiveRecord’s :include, which is essential for some of our pages.

1. Perform the first sort with :limit, and grab the IDs

First, perform the sort with :limit and grab the IDs. For example:

r = Restaurant.find(:all,
                     :order => :popularity,
                     :limit => 100)
ids = r.map { |i| i.id }

Better yet, let’s just select the ids instead of populating the entire restaurant object:

r = Restaurant.find(:all,
                     :select => 'id',
                     :order => :popularity,
                     :limit => 100)
ids = r.map { |i| i.id }

If you’re a freak like me, you might want to get rid of some of that ActiveRecord overhead. Why bother creating those Restaurant objects at all? We can avoid creating Restaurant objects if we write a bit of SQL:

r = ActiveRecord::Base.connection.select_all("select id from restaurants order by popularity limit 100")
ids = r.map { |i| i['id'] }

Hm. There must be a better way. Can we use ActiveRecord to write the SQL, but avoid creating the restaurant objects? You bet, if we call send on ActiveRecord::Base’s construct_finder_sql method. This is perfect for my purposes, because my SQL skills are pretty weak. I can use ActiveRecord to write the SQL, but avoid the unnecessary overhead of creating all those objects.

options = {
  :select => 'id',
  :order => :popularity,
  :limit => 100
}
sql = Restaurant.send(:construct_finder_sql, options)
r = ActiveRecord::Base.connection.select_all(sql)
ids = r.map { |i| i['id'] }

2. Sort the subset

Now that we have the IDs of our subset, we can sort it using ActiveRecord:

Restaurant.find(:all,
                 :conditions => "id in (#{ids.join(',')})",
                 :order => 'price')

We can also take advantage of :include to populate our objects with everything we need.

You’ll Like It

The example above is a bit contrived, but I needed this technique to efficiently render many of the pages on Urbanspoon. Enjoy!

Rails SQL Logging Improvements

Thursday, November 9th, 2006

The logging system built into rails works pretty well out of the box. The development.log file contains timing information, SQL statements, and error traces. I especially like the ANSI color coding, which makes the file much easier to eyeball. Still, there is room for improvement. Here are a few changes I’ve made to my Rails logging setup while working on Urbanspoon.

In a future post maybe I’ll discuss how these stats exposed problematic pages. Several pages were transferring 20x more SQL data than needed. :include may be considered harmful.

Logging SQL bytes transferred per page

ActiveRecord is an interesting beast. It’s very easy to use, but doesn’t provide much in the way of caching. Associations can be eagerly loaded using :include, but how does this affect performance? Timing benchmarks are the ultimate arbiter, but I often want to know other statistics. How many queries (SQL roundtrips) did it take to render this page? How many SQL bytes were transferred for this page?

If you plug this snippet in verbatim, you will start seeing a few changes in your log files. First, each SQL select will be followed by a line that reads 21 rows, 8.3k to indicate how many rows/bytes were transferred for the select. Second, the familiar ActionController timing statements will include the number of SQL bytes transferred:

Completed in … | Rendering: … | DB: 0.01312 (3%) 23.6k | …

A few things to note while reading this code:

  • SQL statistics are only turned on for the development environment.
  • It only works for mysql, though I’m sure the same technique can be used for the other adapters.
  • Minor fudging… I’m reporting “string bytes returned by mysql select”, not actual bytes transferred on the wire. If anyone has a suggestion for a simple way to get at the latter I’m all ears.
  • As always with mixins, this isn’t guaranteed to work with all versions of Rails. I’m using 1.1.6.

Add this code to environment.rb:

# only run this code in development
if ENV['RAILS_ENV'] == 'development'

  # modify MysqlAdapter to track transfer stats
  class ActiveRecord::ConnectionAdapters::MysqlAdapter
    @@stats_queries = @@stats_bytes = @@stats_rows = 0

    def self.get_stats
      { :queries => @@stats_queries,
        :rows => @@stats_rows,
        :bytes => @@stats_bytes }
    end

    def self.reset_stats
      @@stats_queries = @@stats_bytes = @@stats_rows = 0
    end

    def select_with_stats(sql, name)
      bytes = 0
      rows = select_old(sql, name)
      rows.each do |row|
        row.each do |key, value|
          bytes += key.length
          bytes += value.length if value
        end
      end
      @@stats_queries += 1
      @@stats_rows += rows.length
      @@stats_bytes += bytes
      @logger.info sprintf("%d rows, %.1fk", rows.length, bytes.to_f / 1024)
      rows
    end

    alias :select_old :select
    alias :select :select_with_stats
  end

  # modify ActionController to reset/print stats for each request
  class ActionController::Base
    def perform_action_reset
      ActiveRecord::ConnectionAdapters::MysqlAdapter::reset_stats
      perform_action_old
    end

    alias :perform_action_old :perform_action
    alias :perform_action :perform_action_reset

    def active_record_runtime(runtime)
      stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats
      "#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)}"
    end
  end
end

Adding SQL bytes transferred to your layout

For added fun, add this to the bottom of your application.rhtml layout file. This is a technique we used at Jobster to provide immediate stats to developers.

<% if ENV['RAILS_ENV'] == 'development'
   stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats %>
  <%= sprintf("  (%.1fk, %d queries)", stats[:bytes].to_f / 1024, stats[:queries]) %>
<% end %>

Suppress blob logging

You may have noticed that Rails likes to dump your SQL blobs to the log file. This will quickly cause your log file to balloon to gargantuan proportions. If you’re especially unlucky, you’ll run out of disk space and you might be forced out of business entirely. I recommend you add this to your environment.rb file immediately:

# trim blob logging
class ActiveRecord::ConnectionAdapters::MysqlAdapter
  def format_log_entry(message, dump = nil)
    if dump
      dump = dump.gsub(/x'([^']+)'/) do |blob|
        (blob.length > 32) ? "x'#{$1[0,32]}... (#{blob.length} bytes)'" : $0
      end
    end
    super
  end
end

Again, I’ve only tested this with mysql and rails 1.1.6.

Urbanspoon goes live

Thursday, October 26th, 2006

Urbanspoon officially went live today, with restaurant reviews, maps and menus for the Seattle area. Ethan and I have been working hard on urbanspoon for some time, and it feels good to reveal our baby at last.

Urbanspoon let’s you slice and dice the Seattle restaurant scene by neighborhood, cuisine and popularity. We link to restaurants reviews from five authoritative Seattle sources – The Seattle PI, The Seattle Times, The Stranger, The Seattle Weekly, and Citysearch. Users can vote on their favorite restaurants, and the most popular restaurants are prominently featured on all pages. We seeded the vote tallies using the scores from those five authoritative Seattle sources, but those numbers will quickly change as people start to vote.

Here are some other useful things you can do on Urbanspoon:

Urbanspoon was intended to be a fun testbed for our technology ideas, but it seems to have taken on a life of its own. Stay tuned!

Ruby at 60

Monday, October 16th, 2006

I’ve spent the last 60 days learning Ruby, laboring almost full time on a soon-to-be-revealed project. I’ve worked professionally with many different languages, and I figured it was about time to summarize my thoughts. I will discuss Rails in a future post.

Let’s start with some things I love about Ruby.

I Love Ruby. Really.

Goodbye Perl and Python, We Hardly Knew Ye

Ruby is going to completely destroy Perl and Python. We will have forgotten all about them in a few years.

Perl is a fun little language but I can’t see why a neutral developer would ever choose to use it over Ruby. Everything that I liked about Perl can be found in Ruby, including easy regular expressions, rich interpolation, and end of line conditionals. Ruby is a modern programming language and it shines a harsh light on Perl, exposing the glaring cracks that have opened up during its long devolution. Where to begin? Threading. Objects. Creaky syntax. At this point, the only Perl feature I miss is CPAN. Ruby gems is a joke by comparison.

Python is a more recent, robust language than Perl, but it too will quickly succumb to Ruby’s onslaught. Python preaches whereas Ruby tries to be your friend. Ruby is just more fun. Python never really achieved widespread adoption, and I think that most neutral people will choose to learn Ruby instead.

I have deep knowledge of Perl and some experience with Python. If you’re considering learning either of these scripting languages for your project, I recommend using Ruby instead.

A Sweet Tooth For Syntactic Sugar

I am a huge fan of syntactic sugar. For those who are unfamiliar with the term, “syntactic sugar” refers to syntax features specifically added to make it easier to write code. Here are some examples of syntactic sugar in Ruby:

Old school With sugar added
if a == nil
  a = 3
end
a ||= 3
if !a.saved?
  a.save
end
a.save if !a.saved?

Syntactic sugar makes it easier to “eyeball” a block of code and results in a big win for productivity. Ruby is littered with sugary goodness, which makes it a very productive language indeed. I can state this definitively after my first sixty days.

Ruby “Blocks” Make Engineering Easier

Ruby makes it possible to easily hand a “block” of code to a function. Some simple examples:

# add one to every element in an array
a.map { |i| i + 1 }

# replace words with definitions in a string
string.gsub(/\\b\s+\\b/) { |i| definitions[i] }

# open a file, write "hello", close the file
File.open("tst.txt", "w") { |f| f.puts "hello" }

Most of these aren’t very compelling, and there is always an easy way to write the code without using blocks. But I keep running into patterns that really turn out well with blocks. For example, I recently had to write some code that cached database records for performance reasons. Here’s what that code looks like without blocks:

def initialize
  cache = Hash.new
end

def get_from_cache(key)
  value = cache[key]
  if value == nil
    value = cache[key] = get_from_db(key)
  end
  value
end

Eventually I discovered that when you create a Hash object you can pass a block to tell the Hash how to populate uninitialized values. So the above code becomes:

def initialize
  cache = Hash.new { |hash,key| hash[key] = get_from_db(key) }
end

def get_from_cache(key)
  cache[key]
end

I keep using this pattern over and over and each time I appreciate blocks a little more. Blocks make it easier to write excellent software.

“It’s Just a Hashtable”

In a previous post (The 6,000 Line Hashtable) I talked at great length about why developers should be lazy and use hashtables at every opportunity. In Ruby, hashtables are supported syntactically just like arrays:

array = [1, 2, 3]
array[0] = 'hello'
hash = {'a' =>'b', 'c' => 'd'}
hash['a'] = 'world'

This feature isn’t unique to Ruby but it’s still worth noting.

In other languages (c/c++/java/etc.), developers sometimes unnecessarily create elaborate data structures and classes where a hashtable (or a hash of hashes) would just as easily accomplish the same task. People often think that hashtables are too slow or too expensive for their purposes. When hashtables are built into the language, it’s more difficult to harbor those misconceptions.

Stop Writing Bash Scripts. I’m Begging You.

Professional developers often resort to shell scripts for mundane tasks like builds, deployment, database cleanup, automated backups, and a million other secondary concerns. I don’t like bash scripting. I don’t use it often enough to completely master the syntax. I can barely remember how to write an “if” statement in bash, let alone a loop or a switch. Yet I’ve written hundreds of bash scripts simply because there wasn’t a better tool for the job. Those scripts were difficult to write and I doubt anyone bothered to maintain them after I abandoned them.

Since I started learning Ruby, I stopped writing bash scripts. Cold turkey. Everywhere that I might be tempted to write a bash script, I simply use Ruby instead. I wrote a 300 line deployment script for remotely setting up a machine at ServerBeach, all in Ruby. My cron jobs are Ruby. My db setup scripts are Ruby. With luck I won’t have to cobble together another bash script anytime soon.

I’m not religious about languages, I just find it easier to write short Ruby scripts instead of bash scripts.

Please, don’t reply and tell me that ruby is too heavyweight to replace bash. I agree that this is true for some specific tasks, tasks which most developers are unlikely to encounter.

Now, The Rough Spots

There are some things in Ruby that are immensely frustrating. The language has only been around for a few years and I’m not surprised there are rough spots. Here are a few areas that need improvement.

The Man Behind the Curtain is Terrifying

Ruby’s dynamic type system encourages developers to engage in all kinds of neato tricks. “Gee, I can add a method to the String class and use it everywhere!” “Gosh, instead of defining my methods up front I can override method_missing to dynamically add them on the fly!”

If you’re using one of these amazing libraries, wonderful things happen behind the curtain. Your object is magically talking to a database without any intervention on your part. Useful member variables appear when you need them. You can create special helper classes that (somehow) are instantaneously available throughout your entire product without having to use a single “requires”.

That’s all fine and dandy during your first 30 days with Ruby. Unfortunately, this review looks back at 60 days.

What happens when something goes wrong? Good luck trying to figure out what the hell is happening behind that lovely curtain. It’s hard to trace the runtime behavior because so much code is dynamically, inscrutably generated. If you read the library source you’ll find that because Ruby supports mixin classes, seemingly simple APIs are splattered into a dozen files. One class can magically insert itself into another with the greatest of ease.

So, how do I track down problems? Grep. I grep the whole “gems” tree to figure out why things are happening. When grep fails, I use a more powerful grep. Google. When Google fails, I start adding printfs to the support libraries. If I still can’t figure out which class is responsible for the poor behavior, I start cutting features.

Obscure Operators Considered Harmful

I was delighted to discover that Ruby has and and or operators that can be used in place of the traditional && and ||. I lovingly sprinkled them throughout my code because the readability was so much better.

Unfortunately, Ruby’s highly readable boolean operators have a subtly different precedence when compared to the traditional operators. Simple code which looked like it should work failed for inexplicable reasons. I spent hours tracking down one problem after another, continually thinking that somehow my code was at fault. Instead of lovingly sprinkling readable boolean operators throughout my code, I was unknowingly sprinkling bugs. Ticking time bombs. Turds.

This made me very angry.

For added frustration, try running this block of code:

a = b = 2
a++
b *= 7
puts a, b

The above parses and runs just great, except that Ruby doesn’t support the ++ operator and your code won’t work at all! This kind of thing can easily be caught by turning on warnings, but you really don’t want to do that if you’re using gems because you’ll drown under an avalanche of warnings that you can’t fix and can’t suppress.

Standard Libraries Need a Plunger

This has been well documented elsewhere so I won’t go into overwhelming detail. The standard libraries are clogged up with cruft and are severely in need of a plunger. In my opinion the most basic classes should also be the simplest.

For example, I shudder every time I have to use the IO/File classes. As the Ruby doc dryly states, “The two classes are closely associated.” An alternative approach, and in my opinion a superior one, can be seen in Java’s wildly successful layered stream API.

Despite my complaining, I don’t think this is a fatal flaw. But combined with my next point this kind of clog can fill your entire product up with sewage.

Take RDoc Out Behind the Woodshed

RDoc is Ruby’s tool for generating documentation based on comments embedded in source. It’s conceptually similar to Javadoc, Doxygen, and many others. Here is an example of an RDoc page:

http://www.ruby-doc.org/core/

I really can’t understand how RDoc can be so bad. It just plain sucks. The pages are impossible to decipher. I feel pity for anyone trying to read an RDoc page using IE, which lacks Firefox’s find-as-you-type feature.

To add insult to injury, due to Ruby’s mixin madness it’s often impossible to even see a full list of supported methods for a class. It took me two weeks just to figure out how File.readlines() worked, even though I’ve seen it used countless times in sample code. Same thing goes for Hash.grep().

This is a terrible shame, because most of the important methods are actually well documented, complete with examples and grouping of similar methods. The Rails documentation is excellent, if you can stomach RDoc’s spaghetti long enough to locate it.

So, what do you get when you combine clogged standard libraries with an atrocious set of generated documentation? A bloody mess. Around day 45 I managed to get Ruby’s ri tool working on Ubuntu. Otherwise our product would still be languishing on the drawing board.

I Must Be an Idiot, ‘Cause I Don’t Get Modules

I saved this one for last because, well… I’m embarrassed. I am a professional software developer. I take pride in my work. Is it possible that I’m an idiot?

I’ve tried to write modules, I really have. I want to mixin functionality just like the popular libraries! I want my amazing caching class to magically get used everywhere! To date, all my pathetic attempts have resulted in abject failure.

I can’t figure out extend vs. include. I can’t seem to successfully call define_method on anything. My static members are rarely accessible when I expect them to be. extend self gives me the willies, and singletons give me nightmares. Sometimes self is an object, other times it appears to be something completely different. I managed to dynamically add a member to an object once, but I think I just got lucky.

At a certain point during my 60 days, I stopped blaming myself and started asking around. Am I the only one with this problem? Do any of my friends understand this voodoo? I turned to google for answers and learned that “extend self is cool because less typing is good”.

I’m not really interested in fancy language tricks. I just want to create useful software. Please, Ruby, stop trying to be cool and just focus on being easy to use, well documented, and lovable. We can talk about performance when you’re a little older.

valves.com

Tuesday, October 10th, 2006

A few days ago while checking my Coming Soon feed I noticed that the valves.com domain was going to be deleted soon. I logged into pool.com and placed a bid for $50. The auction ended today and it looks like I was outbid, to put it mildly:

Maybe I just don’t have the stomach for domain speculating. I may have to settle for bumblers.com.

Subversion diff viewer CGI, in Ruby

Tuesday, October 3rd, 2006

Updated 10/6/06: added @rev argument to svn diff so it works even if the file has moved.

Tools tool tools. Frankly, I’m obsessed with them. I can’t properly embark on a project unless I first set up a decent environment. At my new company, my buddy and I needed a cheap and cheerful way to keep track of our work. We quickly set up a subversion repository on our site5 host, and then I hacked together a subversion post commit hook as described in my previous post.

I added some meat to the commit emails by plugging in a simple diff viewer. The diff viewer is structured as a CGI that can run completely standalone, so you can plug it into just about any server. Here’s a tiny screenshot from Firefox:

There are a few tidbits I’d like to highlight before we get to the install instructions.

The gentle red/blue/green color scheme is copied directly from the Cascade source code control system we created at Marimba. I believe Arthur van Hoff came up with the colors, and I’ve happily recycled them into many similar projects. Red means deleted, blue means changed, green means added.

The diff viewer shows the whole file, not just those terrible contextual diffs that some people seem to like. How can anyone understand a diff without seeing the whole file? I accomplished this amazing feat simply by passing “-U 10000″ to diff.

Also, the diff blocks are connected with up/down arrow links for easy navigation. Click on the arrows to quickly eyeball each of the differences.

To install:

  1. Copy the script to your web server and rename it to “diff.cgi” or something appropriate.
  2. Make it executable.
  3. Set the REPO constant at the top.
  4. If you’re using my post commit hook, change the diff links to point to the CGI. Modify this line in the filesToHtml function:
            result << CGI.escapeHTML(file)
    

    to read:

          if revision
            result << "<a href='http://YOUR_HOST/AND_PATH/diff.cgi?file=#{CGI.escape(file)}&rev=#{revision}'>#{CGI.escapeHTML(file)}</a>"
          else
            result << CGI.escapeHTML(file)
          end
    

You can download the script or copy and paste it from below:

#!/usr/bin/ruby -w

# svn-diff : CGI for viewing SVN diffs

require 'cgi'

SVN  = "/usr/bin/svn"
DIFF = "/usr/bin/diff"
REPO = "svn://MODIFY_THIS/TO/POINT/TO/YOUR/REPO"

#
# globals
#

$anchor = 0
$last_op = ' '
$left = []
$right = []

#
# helper for building the next row in the diff
#

def getDiffRow()
  anchor = ""
  result = ""
  if $left.length > 0 or $right.length > 0
    if $last_op != ' '
      if $left.length == 0
        clazz = " class='a'"
      elsif $right.length == 0
        clazz = " class='r'"
      else
        clazz = " class='m'"
      end
      anchor << <<EOF
<a name="#{$anchor}"/><a href="##{$anchor-1}">&uarr;&uarr;</a> <a href="##{$anchor+1}">&darr;&darr;</a>
EOF
      $anchor += 1
    else
      clazz = ""
    end
    result = <<EOF
<tr#{clazz}><td>#{anchor}</td><td>#{$left.join("\n")}</td><td>#{$right.join("\n")}</td></tr>
EOF
  end
  result
end

#
# build the diff
#

def getDiff(repo, file, rev1, rev2)
  result = ""

  diff = `#{SVN} diff -r #{rev1}:#{rev2} #{repo}/#{file}@#{rev1} --diff-cmd #{DIFF} -x '-w -U 10000'`.split("\n")
  raise "svn diff failed" if $? != 0

  index = diff.shift
  equals = diff.shift
  header1 = diff.shift
  if header1 =~ /^---/
    result << "<p><a href='#0' style='text-decoration:none'>&darr;&darr;</a></p>\n"
    result << "<table class='diff' width='80%'>"
    result << "<tr height='2'><td/><td width='48%'/><td width='48%'/></tr>"

    # skip header2 and range
    diff.shift
    diff.shift

    diff.each do |line|
      op = line[0,1]
      line = line[1..-1]

      if (($last_op != ' ' and op == ' ') or ($last_op == ' ' and op != ' '))
        result << getDiffRow
        $left.clear
        $right.clear
      end

      # truncate and escape
      line[62..-1] = "..." if line.length > 65
      line = CGI.escapeHTML(line)

      case op
      when ' '
        $left.push(line)
        $right.push(line)
      when '-' then $left.push(line)
      when '+' then $right.push(line)
      end
      $last_op = op
    end

    result << getDiffRow
    result << "</table>"
  else
    result = "<div class='error'>#{header1}</div>"
  end

  <<EOF
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>#{file}, rev #{rev1}:#{rev2}</title>
        <style type="text/css">
          body {
            font : 11pt verdana;
            background : white;
          }
          .error {
            color : red;
          }
          .diff {
            font-size : 9pt;
            font-family : "lucida console", "courier new", monospace;
            white-space : pre;
            border : 1px solid black;
            border-collapse : collapse;
            line-height : 110%;
          }
          .diff td {
            border : none;
            padding : 0px 10px;
            margin : 0px;
          }
          .diff td a {
            text-decoration: none;
          }
          .a { background : #bbffbb; }
          .r { background : #ffbbbb; }
          .m { background : #bbbbff; }
        </style>
    </head>
    <body>
        <h3>Subversion diff on #{file}, rev #{rev1}:#{rev2}</h3>
#{result}
    </body>
</html>
EOF
end

#
# main - handles either command line or CGI
#

cgi = CGI.new
begin
  if cgi.server_software
    file = cgi.params['file']
    rev = cgi.params['rev']
    raise "bad file param" if !file || file.length == 0
    raise "bad rev param" if !rev || rev.length == 0
    file = file[0]
    rev = rev[0]
  else
    file = ARGV.shift
    rev = ARGV.shift
  end

  raise "bad file param" if file.length == 0
  raise "bad rev param" if rev.length == 0

  rev = rev.to_i
  cgi.out("status" => "OK") {
    getDiff(REPO, file, rev - 1, rev)
  }
rescue StandardError => e
  cgi.out("status" => "SERVER_ERROR") {
    <<EOF
<html>
  <body style="color:red">
    <p>error : #{e.message}</p>
<pre>#{e.backtrace.collect { |x| CGI.escapeHTML(x) }.join("\n")}
</pre>
  </body>
</html>
EOF
  }
end

Subversion post commit email hook, in Ruby

Sunday, September 10th, 2006

I’m a big believer in tools. Personally, I believe that better engineers tend to use better tools, but that’s a subject to explore in another post.

Here’s a handy Ruby script that sends a descriptive email after each subversion checkin. The script is based on the one written by Elliott Hughes. Somehow I seem to rewrite this script every time I take a new job, so I’m pleased to release this one into the public domain.

The script lists all files that were added, removed, or modified. Here’s a screenshot:

If you’re going to use this with a sizable engineering team, I recommend changing the HTML email to include a photo of the person who committed the change. That’ll help everyone get acquainted.

Also, it’s nice to have the “modified” lines link to the diff. You can view diffs with ViewCVS. If you find that ViewCVS is too heavyweight (or ugly), I wrote a diff viewing Ruby CGI that gets the job done. I’ll post that shortly. Update: see my subsequent blob post, Subversion diff viewer CGI, in Ruby.

Please excuse my amateurish Ruby. I’m still learning.

To install:

  1. Copy the script to hooks/post-commit in your subversion repository.
  2. Make the script executable.
  3. Modify the ADDRESS constant at the top of the file. Modify other constants if necessary.
  4. (optional) Adjust my beautiful HTML to suit your needs.

You can download the script or copy and paste it from below:

#!/usr/bin/ruby -w

# svn-email.rb
#
# Send svn checkin email, based on a script by Elliott Hughes. To
# install, copy this file into your repository's hooks/ directory as
# "post-commit". Don't forget to chmod a+x post-commit
#
# Author:  Elliott Hughes, Adam Doppelt
# Version: 0.2

require 'cgi'

# constants
ADDRESS = "MODIFY_THIS@MODIFY_THIS.com"
SENDMAIL = "/usr/sbin/sendmail"
#SENDMAIL = "/usr/bin/tee" # for debugging
SVNLOOK = "/usr/bin/svnlook"

# convert a list of files to HTML
def filesToHtml(title, list, revision = nil)
  return "" if list.length == 0

  # truncate if too big
  list[200..-1] = "..." if list.length > 200

  result = ""
  result << "\\n<h3>#{title}</h3>\\n"
  result << "<div class=\"files\">\\n"
  list.each do |file|
    result << "  "
    result << CGI.escapeHTML(file)
    result << "<br/>\\n"
  end
  result << "</div>\\n"
  result
end

# Subversion's commit-email.pl suggests that svnlook might create files.
Dir.chdir("/tmp")

# process ARGV
repo = ARGV.shift
revision = ARGV.shift
raise "bad args" if !repo || !revision

#
# Get the overview information.
#

info = `#{SVNLOOK} info #{repo} -r #{revision}`.split("\\n")
author = info.shift
date = info.shift
size = info.shift
subject = info[0]
comment = info.join("\\n")

#
# iterate changed files
#

added = []
modified = []
removed = []
props_modified = []

`#{SVNLOOK} changed #{repo} -r #{revision}`.split("\\n").each do |line|
  op = line[0,1]
  props = line[1,1]
  file = line[4..-1]

  # escape the filename
  file = CGI.escapeHTML(file)

  case op
  when 'A' then added.push(file)
  when 'U' then modified.push(file)
  when 'D' then removed.push(file)
  end

  props_modified.push(file) if props == 'U'
end

#
# build the message body
#

body = <<EOF
<html>
    <head>
        <style type="text/css">
            .main {
              font : 10pt verdana;
              background: white;
              width: 95%
            }
            .main h3 {
              margin: 15px 0px 5px 0px;
            }
            .comment {
              border: 1px solid #dddddd;
              padding: 5px;
            }
            .files {
              border: 1px solid #dddddd;
              padding: 5px;
              background: #eeeeff;
            }
        </style>
    </head>
    <body>
        <div class="main">
            <h3>Revision #{revision} by #{CGI.escapeHTML(author)}</h3>
            <div class="comment">
                #{CGI.escapeHTML(comment).split("\\n").join("<br/>")}
            </div>
            #{filesToHtml("Added Paths", added)}
            #{filesToHtml("Modified Paths", modified, revision)}
            #{filesToHtml("Removed Paths", removed)}
            #{filesToHtml("Property Changed", props_modified)}
        </div>
    </body>
</html>
EOF

#
# Write the mail headers
#

header = ""
header << "To: #{ADDRESS}\\n"
header << "From: #{ADDRESS}\\n"
header << "Subject: [svn] [#{revision}] #{subject}\\n"
header << "MIME-Version: 1.0\\n"
header << "Content-Type: text/html; charset=UTF-8\\n"
header << "Content-Transfer-Encoding: 8bit\\n"
header << "\\n"

#
# Send the mail.
#

begin
    fd = open("|#{SENDMAIL} #{ADDRESS}", "w")
    fd.print(header)
    fd.print(body)
rescue
    exit 1
end
fd.close