Archive for the 'Uncategorized' Category

« Previous EntriesNext Entries »

Complex SQL Sorts with Rails/ActiveRecord

Saturday, November 18th, 2006

On Urbanspoon, we often need to efficiently sort a subset of records that were retrieved using a different sort order. For example, our Most Popular Restaurants in Seattle page first selects the top 100 restaurants, then allows the user to sort by name or price.

We need to perform a sort, then perform a secondary sort on a subset of the results. Here is an example of a two step sort in action:


unsorted

sort by popularity

sort by price, but
only the first 100

I tried various implementations:

  • Perform the second sort in Ruby. This is inelegant, inefficient, and impractical for expensive sorts like distance in miles.
  • Use a subselect to get the ids in your :conditions. Unfortunately, Mysql doesn’t support LIMIT in subselects. This also breaks sql_calc_found_rows, which we use with some of our complicated sorts.
  • First select the ids, then manually construct the :conditions from the ids.

After several fumbling attempts, I eventually settled on the last approach. This technique requires an additional query, but it let’s you use things like ActiveRecord’s :include, which is essential for some of our pages.

1. Perform the first sort with :limit, and grab the IDs

First, perform the sort with :limit and grab the IDs. For example:

r = Restaurant.find(:all,
                     :order => :popularity,
                     :limit => 100)
ids = r.map { |i| i.id }

Better yet, let’s just select the ids instead of populating the entire restaurant object:

r = Restaurant.find(:all,
                     :select => ‘id’,
                     :order => :popularity,
                     :limit => 100)
ids = r.map { |i| i.id }

If you’re a freak like me, you might want to get rid of some of that ActiveRecord overhead. Why bother creating those Restaurant objects at all? We can avoid creating Restaurant objects if we write a bit of SQL:

r = ActiveRecord::Base.connection.select_all("select id from restaurants order by popularity limit 100")
ids = r.map { |i| i[‘id’] }

Hm. There must be a better way. Can we use ActiveRecord to write the SQL, but avoid creating the restaurant objects? You bet, if we call send on ActiveRecord::Base’s construct_finder_sql method. This is perfect for my purposes, because my SQL skills are pretty weak. I can use ActiveRecord to write the SQL, but avoid the unnecessary overhead of creating all those objects.

options = {
  :select => ‘id’,
  :order => :popularity,
  :limit => 100
}
sql = Restaurant.send(:construct_finder_sql, options)
r = ActiveRecord::Base.connection.select_all(sql)
ids = r.map { |i| i[‘id’] }

2. Sort the subset

Now that we have the IDs of our subset, we can sort it using ActiveRecord:

Restaurant.find(:all,
                 :conditions => "id in (#{ids.join(’,')})",
                 :order => ‘price’)

We can also take advantage of :include to populate our objects with everything we need.

You’ll Like It

The example above is a bit contrived, but I needed this technique to efficiently render many of the pages on Urbanspoon. Enjoy!

Rails SQL Logging Improvements

Thursday, November 9th, 2006

The logging system built into rails works pretty well out of the box. The development.log file contains timing information, SQL statements, and error traces. I especially like the ANSI color coding, which makes the file much easier to eyeball. Still, there is room for improvement. Here are a few changes I’ve made to my Rails logging setup while working on Urbanspoon.

In a future post maybe I’ll discuss how these stats exposed problematic pages. Several pages were transferring 20x more SQL data than needed. :include may be considered harmful.

Logging SQL bytes transferred per page

ActiveRecord is an interesting beast. It’s very easy to use, but doesn’t provide much in the way of caching. Associations can be eagerly loaded using :include, but how does this affect performance? Timing benchmarks are the ultimate arbiter, but I often want to know other statistics. How many queries (SQL roundtrips) did it take to render this page? How many SQL bytes were transferred for this page?

If you plug this snippet in verbatim, you will start seeing a few changes in your log files. First, each SQL select will be followed by a line that reads 21 rows, 8.3k to indicate how many rows/bytes were transferred for the select. Second, the familiar ActionController timing statements will include the number of SQL bytes transferred:

Completed in … | Rendering: … | DB: 0.01312 (3%) 23.6k | …

A few things to note while reading this code:

  • SQL statistics are only turned on for the development environment.
  • It only works for mysql, though I’m sure the same technique can be used for the other adapters.
  • Minor fudging… I’m reporting “string bytes returned by mysql select”, not actual bytes transferred on the wire. If anyone has a suggestion for a simple way to get at the latter I’m all ears.
  • As always with mixins, this isn’t guaranteed to work with all versions of Rails. I’m using 1.1.6.

Add this code to environment.rb:

# only run this code in development
if ENV[‘RAILS_ENV’] == ‘development’

  # modify MysqlAdapter to track transfer stats
  class ActiveRecord::ConnectionAdapters::MysqlAdapter
    @@stats_queries = @@stats_bytes = @@stats_rows = 0

    def self.get_stats
      { :queries => @@stats_queries,
        :rows => @@stats_rows,
        :bytes => @@stats_bytes }
    end

    def self.reset_stats
      @@stats_queries = @@stats_bytes = @@stats_rows = 0
    end

    def select_with_stats(sql, name)
      bytes = 0
      rows = select_old(sql, name)
      rows.each do |row|
        row.each do |key, value|
          bytes += key.length
          bytes += value.length if value
        end
      end
      @@stats_queries += 1
      @@stats_rows += rows.length
      @@stats_bytes += bytes
      @logger.info sprintf("%d rows, %.1fk", rows.length, bytes.to_f / 1024)
      rows
    end

    alias :select_old :select
    alias :select :select_with_stats
  end

  # modify ActionController to reset/print stats for each request
  class ActionController::Base
    def perform_action_reset
      ActiveRecord::ConnectionAdapters::MysqlAdapter::reset_stats
      perform_action_old
    end

    alias :perform_action_old :perform_action
    alias :perform_action :perform_action_reset

    def active_record_runtime(runtime)
      stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats
      "#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)}"
    end
  end
end

Adding SQL bytes transferred to your layout

For added fun, add this to the bottom of your application.rhtml layout file. This is a technique we used at Jobster to provide immediate stats to developers.

<% if ENV['RAILS_ENV'] == 'development'
   stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats %>
  <%= sprintf("  (%.1fk, %d queries)", stats[:bytes].to_f / 1024, stats[:queries]) %>
<% end %>

Suppress blob logging

You may have noticed that Rails likes to dump your SQL blobs to the log file. This will quickly cause your log file to balloon to gargantuan proportions. If you’re especially unlucky, you’ll run out of disk space and you might be forced out of business entirely. I recommend you add this to your environment.rb file immediately:

# trim blob logging
class ActiveRecord::ConnectionAdapters::MysqlAdapter
  def format_log_entry(message, dump = nil)
    if dump
      dump = dump.gsub(/x’([^’]+)’/) do |blob|
        (blob.length > 32) ? "x’#{$1[0,32]}… (#{blob.length} bytes)’" : $0
      end
    end
    super
  end
end

Again, I’ve only tested this with mysql and rails 1.1.6.

Urbanspoon goes live

Thursday, October 26th, 2006

Urbanspoon officially went live today, with restaurant reviews, maps and menus for the Seattle area. Ethan and I have been working hard on urbanspoon for some time, and it feels good to reveal our baby at last.

Urbanspoon let’s you slice and dice the Seattle restaurant scene by neighborhood, cuisine and popularity. We link to restaurants reviews from five authoritative Seattle sources - The Seattle PI, The Seattle Times, The Stranger, The Seattle Weekly, and Citysearch. Users can vote on their favorite restaurants, and the most popular restaurants are prominently featured on all pages. We seeded the vote tallies using the scores from those five authoritative Seattle sources, but those numbers will quickly change as people start to vote.

Here are some other useful things you can do on Urbanspoon:

Urbanspoon was intended to be a fun testbed for our technology ideas, but it seems to have taken on a life of its own. Stay tuned!

Ruby at 60

Monday, October 16th, 2006

I’ve spent the last 60 days learning Ruby, laboring almost full time on a soon-to-be-revealed project. I’ve worked professionally with many different languages, and I figured it was about time to summarize my thoughts. I will discuss Rails in a future post.

Let’s start with some things I love about Ruby.

I Love Ruby. Really.

Goodbye Perl and Python, We Hardly Knew Ye

Ruby is going to completely destroy Perl and Python. We will have forgotten all about them in a few years.

Perl is a fun little language but I can’t see why a neutral developer would ever choose to use it over Ruby. Everything that I liked about Perl can be found in Ruby, including easy regular expressions, rich interpolation, and end of line conditionals. Ruby is a modern programming language and it shines a harsh light on Perl, exposing the glaring cracks that have opened up during its long devolution. Where to begin? Threading. Objects. Creaky syntax. At this point, the only Perl feature I miss is CPAN. Ruby gems is a joke by comparison.

Python is a more recent, robust language than Perl, but it too will quickly succumb to Ruby’s onslaught. Python preaches whereas Ruby tries to be your friend. Ruby is just more fun. Python never really achieved widespread adoption, and I think that most neutral people will choose to learn Ruby instead.

I have deep knowledge of Perl and some experience with Python. If you’re considering learning either of these scripting languages for your project, I recommend using Ruby instead.

A Sweet Tooth For Syntactic Sugar

I am a huge fan of syntactic sugar. For those who are unfamiliar with the term, “syntactic sugar” refers to syntax features specifically added to make it easier to write code. Here are some examples of syntactic sugar in Ruby:

Old school With sugar added
if a == nil
  a = 3
end
a ||= 3
if !a.saved?
  a.save
end
a.save if !a.saved?

Syntactic sugar makes it easier to “eyeball” a block of code and results in a big win for productivity. Ruby is littered with sugary goodness, which makes it a very productive language indeed. I can state this definitively after my first sixty days.

Ruby “Blocks” Make Engineering Easier

Ruby makes it possible to easily hand a “block” of code to a function. Some simple examples:

# add one to every element in an array
a.map { |i| i + 1 }

# replace words with definitions in a string
string.gsub(/\\b\s+\\b/) { |i| definitions[i] }

# open a file, write "hello", close the file
File.open("tst.txt", "w") { |f| f.puts "hello" }

Most of these aren’t very compelling, and there is always an easy way to write the code without using blocks. But I keep running into patterns that really turn out well with blocks. For example, I recently had to write some code that cached database records for performance reasons. Here’s what that code looks like without blocks:

def initialize
  cache = Hash.new
end

def get_from_cache(key)
  value = cache[key]
  if value == nil
    value = cache[key] = get_from_db(key)
  end
  value
end

Eventually I discovered that when you create a Hash object you can pass a block to tell the Hash how to populate uninitialized values. So the above code becomes:

def initialize
  cache = Hash.new { |hash,key| hash[key] = get_from_db(key) }
end

def get_from_cache(key)
  cache[key]
end

I keep using this pattern over and over and each time I appreciate blocks a little more. Blocks make it easier to write excellent software.

“It’s Just a Hashtable”

In a previous post (The 6,000 Line Hashtable) I talked at great length about why developers should be lazy and use hashtables at every opportunity. In Ruby, hashtables are supported syntactically just like arrays:

array = [1, 2, 3]
array[0] = 'hello'
hash = {'a' =>'b', 'c' => 'd'}
hash['a'] = 'world'

This feature isn’t unique to Ruby but it’s still worth noting.

In other languages (c/c++/java/etc.), developers sometimes unnecessarily create elaborate data structures and classes where a hashtable (or a hash of hashes) would just as easily accomplish the same task. People often think that hashtables are too slow or too expensive for their purposes. When hashtables are built into the language, it’s more difficult to harbor those misconceptions.

Stop Writing Bash Scripts. I’m Begging You.

Professional developers often resort to shell scripts for mundane tasks like builds, deployment, database cleanup, automated backups, and a million other secondary concerns. I don’t like bash scripting. I don’t use it often enough to completely master the syntax. I can barely remember how to write an “if” statement in bash, let alone a loop or a switch. Yet I’ve written hundreds of bash scripts simply because there wasn’t a better tool for the job. Those scripts were difficult to write and I doubt anyone bothered to maintain them after I abandoned them.

Since I started learning Ruby, I stopped writing bash scripts. Cold turkey. Everywhere that I might be tempted to write a bash script, I simply use Ruby instead. I wrote a 300 line deployment script for remotely setting up a machine at ServerBeach, all in Ruby. My cron jobs are Ruby. My db setup scripts are Ruby. With luck I won’t have to cobble together another bash script anytime soon.

I’m not religious about languages, I just find it easier to write short Ruby scripts instead of bash scripts.

Please, don’t reply and tell me that ruby is too heavyweight to replace bash. I agree that this is true for some specific tasks, tasks which most developers are unlikely to encounter.

Now, The Rough Spots

There are some things in Ruby that are immensely frustrating. The language has only been around for a few years and I’m not surprised there are rough spots. Here are a few areas that need improvement.

The Man Behind the Curtain is Terrifying

Ruby’s dynamic type system encourages developers to engage in all kinds of neato tricks. “Gee, I can add a method to the String class and use it everywhere!” “Gosh, instead of defining my methods up front I can override method_missing to dynamically add them on the fly!”

If you’re using one of these amazing libraries, wonderful things happen behind the curtain. Your object is magically talking to a database without any intervention on your part. Useful member variables appear when you need them. You can create special helper classes that (somehow) are instantaneously available throughout your entire product without having to use a single “requires”.

That’s all fine and dandy during your first 30 days with Ruby. Unfortunately, this review looks back at 60 days.

What happens when something goes wrong? Good luck trying to figure out what the hell is happening behind that lovely curtain. It’s hard to trace the runtime behavior because so much code is dynamically, inscrutably generated. If you read the library source you’ll find that because Ruby supports mixin classes, seemingly simple APIs are splattered into a dozen files. One class can magically insert itself into another with the greatest of ease.

So, how do I track down problems? Grep. I grep the whole “gems” tree to figure out why things are happening. When grep fails, I use a more powerful grep. Google. When Google fails, I start adding printfs to the support libraries. If I still can’t figure out which class is responsible for the poor behavior, I start cutting features.

Obscure Operators Considered Harmful

I was delighted to discover that Ruby has and and or operators that can be used in place of the traditional && and ||. I lovingly sprinkled them throughout my code because the readability was so much better.

Unfortunately, Ruby’s highly readable boolean operators have a subtly different precedence when compared to the traditional operators. Simple code which looked like it should work failed for inexplicable reasons. I spent hours tracking down one problem after another, continually thinking that somehow my code was at fault. Instead of lovingly sprinkling readable boolean operators throughout my code, I was unknowingly sprinkling bugs. Ticking time bombs. Turds.

This made me very angry.

For added frustration, try running this block of code:

a = b = 2
a++
b *= 7
puts a, b

The above parses and runs just great, except that Ruby doesn’t support the ++ operator and your code won’t work at all! This kind of thing can easily be caught by turning on warnings, but you really don’t want to do that if you’re using gems because you’ll drown under an avalanche of warnings that you can’t fix and can’t suppress.

Standard Libraries Need a Plunger

This has been well documented elsewhere so I won’t go into overwhelming detail. The standard libraries are clogged up with cruft and are severely in need of a plunger. In my opinion the most basic classes should also be the simplest.

For example, I shudder every time I have to use the IO/File classes. As the Ruby doc dryly states, “The two classes are closely associated.” An alternative approach, and in my opinion a superior one, can be seen in Java’s wildly successful layered stream API.

Despite my complaining, I don’t think this is a fatal flaw. But combined with my next point this kind of clog can fill your entire product up with sewage.

Take RDoc Out Behind the Woodshed

RDoc is Ruby’s tool for generating documentation based on comments embedded in source. It’s conceptually similar to Javadoc, Doxygen, and many others. Here is an example of an RDoc page:

http://www.ruby-doc.org/core/

I really can’t understand how RDoc can be so bad. It just plain sucks. The pages are impossible to decipher. I feel pity for anyone trying to read an RDoc page using IE, which lacks Firefox’s find-as-you-type feature.

To add insult to injury, due to Ruby’s mixin madness it’s often impossible to even see a full list of supported methods for a class. It took me two weeks just to figure out how File.readlines() worked, even though I’ve seen it used countless times in sample code. Same thing goes for Hash.grep().

This is a terrible shame, because most of the important methods are actually well documented, complete with examples and grouping of similar methods. The Rails documentation is excellent, if you can stomach RDoc’s spaghetti long enough to locate it.

So, what do you get when you combine clogged standard libraries with an atrocious set of generated documentation? A bloody mess. Around day 45 I managed to get Ruby’s ri tool working on Ubuntu. Otherwise our product would still be languishing on the drawing board.

I Must Be an Idiot, ‘Cause I Don’t Get Modules

I saved this one for last because, well… I’m embarrassed. I am a professional software developer. I take pride in my work. Is it possible that I’m an idiot?

I’ve tried to write modules, I really have. I want to mixin functionality just like the popular libraries! I want my amazing caching class to magically get used everywhere! To date, all my pathetic attempts have resulted in abject failure.

I can’t figure out extend vs. include. I can’t seem to successfully call define_method on anything. My static members are rarely accessible when I expect them to be. extend self gives me the willies, and singletons give me nightmares. Sometimes self is an object, other times it appears to be something completely different. I managed to dynamically add a member to an object once, but I think I just got lucky.

At a certain point during my 60 days, I stopped blaming myself and started asking around. Am I the only one with this problem? Do any of my friends understand this voodoo? I turned to google for answers and learned that “extend self is cool because less typing is good”.

I’m not really interested in fancy language tricks. I just want to create useful software. Please, Ruby, stop trying to be cool and just focus on being easy to use, well documented, and lovable. We can talk about performance when you’re a little older.

valves.com

Tuesday, October 10th, 2006

A few days ago while checking my Coming Soon feed I noticed that the valves.com domain was going to be deleted soon. I logged into pool.com and placed a bid for $50. The auction ended today and it looks like I was outbid, to put it mildly:

Maybe I just don’t have the stomach for domain speculating. I may have to settle for bumblers.com.

Subversion diff viewer CGI, in Ruby

Tuesday, October 3rd, 2006

Updated 10/6/06: added @rev argument to svn diff so it works even if the file has moved.

Tools tool tools. Frankly, I’m obsessed with them. I can’t properly embark on a project unless I first set up a decent environment. At my new company, my buddy and I needed a cheap and cheerful way to keep track of our work. We quickly set up a subversion repository on our site5 host, and then I hacked together a subversion post commit hook as described in my previous post.

I added some meat to the commit emails by plugging in a simple diff viewer. The diff viewer is structured as a CGI that can run completely standalone, so you can plug it into just about any server. Here’s a tiny screenshot from Firefox:

There are a few tidbits I’d like to highlight before we get to the install instructions.

The gentle red/blue/green color scheme is copied directly from the Cascade source code control system we created at Marimba. I believe Arthur van Hoff came up with the colors, and I’ve happily recycled them into many similar projects. Red means deleted, blue means changed, green means added.

The diff viewer shows the whole file, not just those terrible contextual diffs that some people seem to like. How can anyone understand a diff without seeing the whole file? I accomplished this amazing feat simply by passing “-U 10000″ to diff.

Also, the diff blocks are connected with up/down arrow links for easy navigation. Click on the arrows to quickly eyeball each of the differences.

To install:

  1. Copy the script to your web server and rename it to “diff.cgi” or something appropriate.
  2. Make it executable.
  3. Set the REPO constant at the top.
  4. If you’re using my post commit hook, change the diff links to point to the CGI. Modify this line in the filesToHtml function:
            result << CGI.escapeHTML(file)
    

    to read:

          if revision
            result << "<a href='http://YOUR_HOST/AND_PATH/diff.cgi?file=#{CGI.escape(file)}&rev=#{revision}'>#{CGI.escapeHTML(file)}</a>"
          else
            result << CGI.escapeHTML(file)
          end
    

You can download the script or copy and paste it from below:

#!/usr/bin/ruby -w

# svn-diff : CGI for viewing SVN diffs

require ‘cgi’

SVN  = "/usr/bin/svn"
DIFF = "/usr/bin/diff"
REPO = "svn://MODIFY_THIS/TO/POINT/TO/YOUR/REPO"

#
# globals
#

$anchor = 0
$last_op = ‘ ‘
$left = []
$right = []

#
# helper for building the next row in the diff
#

def getDiffRow()
  anchor = ""
  result = ""
  if $left.length > 0 or $right.length > 0
    if $last_op != ‘ ‘
      if $left.length == 0
        clazz = " class=’a'"
      elsif $right.length == 0
        clazz = " class=’r'"
      else
        clazz = " class=’m'"
      end
      anchor << <<EOF
<a name="#{$anchor}"/><a href="##{$anchor-1}">&uarr;&uarr;</a> <a href="##{$anchor+1}">&darr;&darr;</a>
EOF
      $anchor += 1
    else
      clazz = ""
    end
    result = <<EOF
<tr#{clazz}><td>#{anchor}</td><td>#{$left.join("\n")}</td><td>#{$right.join("\n")}</td></tr>
EOF
  end
  result
end

#
# build the diff
#

def getDiff(repo, file, rev1, rev2)
  result = ""

  diff = `#{SVN} diff -r #{rev1}:#{rev2} #{repo}/#{file}@#{rev1} –diff-cmd #{DIFF} -x ‘-w -U 10000′`.split("\n")
  raise "svn diff failed" if $? != 0

  index = diff.shift
  equals = diff.shift
  header1 = diff.shift
  if header1 =~ /^—/
    result << "<p><a href=’#0′ style=’text-decoration:none’>&darr;&darr;</a></p>\n"
    result << "<table class=’diff’ width=’80%’>"
    result << "<tr height=’2′><td/><td width=’48%’/><td width=’48%’/></tr>"

    # skip header2 and range
    diff.shift
    diff.shift

    diff.each do |line|
      op = line[0,1]
      line = line[1..-1]

      if (($last_op != ‘ ‘ and op == ‘ ‘) or ($last_op == ‘ ‘ and op != ‘ ‘))
        result << getDiffRow
        $left.clear
        $right.clear
      end

      # truncate and escape
      line[62..-1] = "…" if line.length > 65
      line = CGI.escapeHTML(line)

      case op
      when ‘ ‘
        $left.push(line)
        $right.push(line)
      when ‘-’ then $left.push(line)
      when ‘+’ then $right.push(line)
      end
      $last_op = op
    end

    result << getDiffRow
    result << "</table>"
  else
    result = "<div class=’error’>#{header1}</div>"
  end

  <<EOF
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>#{file}, rev #{rev1}:#{rev2}</title>
        <style type="text/css">
          body {
            font : 11pt verdana;
            background : white;
          }
          .error {
            color : red;
          }
          .diff {
            font-size : 9pt;
            font-family : "lucida console", "courier new", monospace;
            white-space : pre;
            border : 1px solid black;
            border-collapse : collapse;
            line-height : 110%;
          }
          .diff td {
            border : none;
            padding : 0px 10px;
            margin : 0px;
          }
          .diff td a {
            text-decoration: none;
          }
          .a { background : #bbffbb; }
          .r { background : #ffbbbb; }
          .m { background : #bbbbff; }
        </style>
    </head>
    <body>
        <h3>Subversion diff on #{file}, rev #{rev1}:#{rev2}</h3>
#{result}
    </body>
</html>
EOF
end

#
# main - handles either command line or CGI
#

cgi = CGI.new
begin
  if cgi.server_software
    file = cgi.params[‘file’]
    rev = cgi.params[‘rev’]
    raise "bad file param" if !file || file.length == 0
    raise "bad rev param" if !rev || rev.length == 0
    file = file[0]
    rev = rev[0]
  else
    file = ARGV.shift
    rev = ARGV.shift
  end

  raise "bad file param" if file.length == 0
  raise "bad rev param" if rev.length == 0

  rev = rev.to_i
  cgi.out("status" => "OK") {
    getDiff(REPO, file, rev - 1, rev)
  }
rescue StandardError => e
  cgi.out("status" => "SERVER_ERROR") {
    <<EOF
<html>
  <body style="color:red">
    <p>error : #{e.message}</p>
<pre>#{e.backtrace.collect { |x| CGI.escapeHTML(x) }.join("\n")}
</pre>
  </body>
</html>
EOF
  }
end

Subversion post commit email hook, in Ruby

Sunday, September 10th, 2006

I’m a big believer in tools. Personally, I believe that better engineers tend to use better tools, but that’s a subject to explore in another post.

Here’s a handy Ruby script that sends a descriptive email after each subversion checkin. The script is based on the one written by Elliott Hughes. Somehow I seem to rewrite this script every time I take a new job, so I’m pleased to release this one into the public domain.

The script lists all files that were added, removed, or modified. Here’s a screenshot:

If you’re going to use this with a sizable engineering team, I recommend changing the HTML email to include a photo of the person who committed the change. That’ll help everyone get acquainted.

Also, it’s nice to have the “modified” lines link to the diff. You can view diffs with ViewCVS. If you find that ViewCVS is too heavyweight (or ugly), I wrote a diff viewing Ruby CGI that gets the job done. I’ll post that shortly. Update: see my subsequent blob post, Subversion diff viewer CGI, in Ruby.

Please excuse my amateurish Ruby. I’m still learning.

To install:

  1. Copy the script to hooks/post-commit in your subversion repository.
  2. Make the script executable.
  3. Modify the ADDRESS constant at the top of the file. Modify other constants if necessary.
  4. (optional) Adjust my beautiful HTML to suit your needs.

You can download the script or copy and paste it from below:

#!/usr/bin/ruby -w

# svn-email.rb
#
# Send svn checkin email, based on a script by Elliott Hughes. To
# install, copy this file into your repository’s hooks/ directory as
# "post-commit". Don’t forget to chmod a+x post-commit
#
# Author:  Elliott Hughes, Adam Doppelt
# Version: 0.2

require ‘cgi’

# constants
ADDRESS = "MODIFY_THIS@MODIFY_THIS.com"
SENDMAIL = "/usr/sbin/sendmail"
#SENDMAIL = "/usr/bin/tee" # for debugging
SVNLOOK = "/usr/bin/svnlook"

# convert a list of files to HTML
def filesToHtml(title, list, revision = nil)
  return "" if list.length == 0

  # truncate if too big
  list[200..-1] = "…" if list.length > 200

  result = ""
  result << "\\n<h3>#{title}</h3>\\n"
  result << "<div class=\"files\">\\n"
  list.each do |file|
    result << "  "
    result << CGI.escapeHTML(file)
    result << "<br/>\\n"
  end
  result << "</div>\\n"
  result
end

# Subversion’s commit-email.pl suggests that svnlook might create files.
Dir.chdir("/tmp")

# process ARGV
repo = ARGV.shift
revision = ARGV.shift
raise "bad args" if !repo || !revision

#
# Get the overview information.
#

info = `#{SVNLOOK} info #{repo} -r #{revision}`.split("\\n")
author = info.shift
date = info.shift
size = info.shift
subject = info[0]
comment = info.join("\\n")

#
# iterate changed files
#

added = []
modified = []
removed = []
props_modified = []

`#{SVNLOOK} changed #{repo} -r #{revision}`.split("\\n").each do |line|
  op = line[0,1]
  props = line[1,1]
  file = line[4..-1]

  # escape the filename
  file = CGI.escapeHTML(file)

  case op
  when ‘A’ then added.push(file)
  when ‘U’ then modified.push(file)
  when ‘D’ then removed.push(file)
  end

  props_modified.push(file) if props == ‘U’
end

#
# build the message body
#

body = <<EOF
<html>
    <head>
        <style type="text/css">
            .main {
              font : 10pt verdana;
              background: white;
              width: 95%
            }
            .main h3 {
              margin: 15px 0px 5px 0px;
            }
            .comment {
              border: 1px solid #dddddd;
              padding: 5px;
            }
            .files {
              border: 1px solid #dddddd;
              padding: 5px;
              background: #eeeeff;
            }
        </style>
    </head>
    <body>
        <div class="main">
            <h3>Revision #{revision} by #{CGI.escapeHTML(author)}</h3>
            <div class="comment">
                #{CGI.escapeHTML(comment).split("\\n").join("<br/>")}
            </div>
            #{filesToHtml("Added Paths", added)}
            #{filesToHtml("Modified Paths", modified, revision)}
            #{filesToHtml("Removed Paths", removed)}
            #{filesToHtml("Property Changed", props_modified)}
        </div>
    </body>
</html>
EOF

#
# Write the mail headers
#

header = ""
header << "To: #{ADDRESS}\\n"
header << "From: #{ADDRESS}\\n"
header << "Subject: [svn] [#{revision}] #{subject}\\n"
header << "MIME-Version: 1.0\\n"
header << "Content-Type: text/html; charset=UTF-8\\n"
header << "Content-Transfer-Encoding: 8bit\\n"
header << "\\n"

#
# Send the mail.
#

begin
    fd = open("|#{SENDMAIL} #{ADDRESS}", "w")
    fd.print(header)
    fd.print(body)
rescue
    exit 1
end
fd.close

emacs dotfiles 2006-09-03

Sunday, September 3rd, 2006

I’ve uploaded the latest release of my emacs dotfiles. Download them here:

Adam’s Emacs Dotfiles

From the changelog:

2006-09-03
- ruby 1.74.2.14 from Ruby 1.8.4
- ruby electric Mar 2005 from Dee Zsombor
- many ruby fixes and enhancements (Rakefile, yaml, indent=2, etc.)
- cc-mode 5.31.3
- csharp-mode 0.5.0 from Dylan Moonfire (thanks Brigham)

Ten Obscure Yet Handy Firefox Extensions

Tuesday, August 29th, 2006

To build on my previous post concerning Windows apps, I’d like to suggest some Firefox extensions for your enjoyment. I think that my small readership will appreciate this list, especially those who complained that my previous post simply did not apply to their Mac-centric lives.

My current Firefox install contains 19 extensions. Instead of listing the most popular (adblock, web developer, etc.) I will instead give you the more obscure (yet handy) extensions that I’ve encountered:

  • Colorful Tabs - Colors the tabs in Firefox. Looks great. The author’s homepage features a bizarre circa-1997 FRAME layout, which makes navigation exciting. If you can find the installer you’re in for a treat.
  • Download Statusbar - Rid yourself of that annoying download window. Instead, download progress will quietly show up in the status bar. I especially like the gradient progress bars.
  • Duplicate Tab - Right click on a tab and select “Duplicate Tab”. Why isn’t this built into Firefox?
  • FindBar Switcher - Hit CTRL-F to make the find bar appear and disappear. Another one for the “soon to be included in Firefox” list.
  • Fission - Turns your bland white address bar background into a sleek blue progress bar. No need to switch now!
  • Google Pagerank Status - Displays the Google PageRank in the status bar. You SEO fiends can finally uninstall that clunk Google Toolbar extension.
  • Live HTTP Headers - Displays the HTTP headers sent and received by Firefox while you browse. I’ve devoted multiple weeks of my life to writing various debug proxies. Happily, all my hard work is obsoleted by this simple extension.
  • NextPlease! - Many sites slice a single article across multiple pages, either to artificially inflate ad views or reduce server load. NextPlease! let’s you navigate to the “next page” using a keyboard shortcut. Works with nytimes, gamespot, google, tom’s hardware, ebay, google… The list goes on and on. Great!
  • Super DragAndGo - Click and drag to “throw” a link toward the top of the screen and open it in a new tab. I use this dozens of times each day. Indispensable.
  • TableTools - Windows really spoiled us by making most tables easily sortable. The fact that many HTML tables still can’t be sorted is a sad testament to the complexities of CSS, HTML and SQL. The TableTools extension offers lots of strange and wonderful features which I don’t use, but I adore its ability to sort tables.

Ten Windows Apps You Need

Tuesday, August 22nd, 2006

I have a soft spot for well written Windows apps, especially when they’re small and free. Let’s raise a cheer for the legions of hackers struggling to make the world’s most popular OS usable. Do yourself a favor and install the following:

  • AutoRuns - list (and potentially turn off) applications that run at startup. Take that Quicktime and RealPlayer!
  • AutoStitch - stitch multiple photos together into a panorma. You might want this if you visit places like Mt. Rainier, Sydney, etc.
  • Console - an excellent command line “console” for windows in which you can run cmd.com (or tcsh, or bash, or whatever). Supports tabs, transparency, wild color schemes, and any monospaced font your heart desires.
  • DVD Decrypter - turn DVDs into ISOs and vice versa. I believe development ceased because this is considered illegal according to the DMCA. Luckily, the current release is almost perfect.
  • Easy Thumbnails - batch image processing. You can easily resize a bunch of photos for inclusion on a web page. For some reason I have to do this quite often.
  • Ffdshow - play all video codecs on Windows. Also, get Media Player Classic and expunge Windows Media Player forever.
  • Mp3tag - clean up the tags in your mp3 files, then reimport into iTunes. You’ll thank me later.
  • Notepad++ and Notepad2 - two great options for quick edits. They’re tiny, start instantly, and highlight lots of different file formats.
  • Picasa - for organizing and fixing up photos. Better yet, try the recently released Picasa Beta, which finally includes a Save Changes button for when you’re done retouching. Incidentally, if you’re interested in graphics hacks you should google Michael Herf, who used to be Picasa’s CTO before they were bought by google. His stereopsis page is fascinating.
  • Taskbar Shuffle - drag and drop the buttons in your taskbar. Where have you been all these years?

I’ll be discussing Bash on Windows (and MSYS) in a future post.