Archive for July, 2007

ActiveRecord Table Transform (or, how to write to the db 27,000 times)

Tuesday, July 31st, 2007

At Urbanspoon, we use pretty urls for our pages to make them more palatable to users and search engines. Here’s an example:

http://www.urbanspoon.com/r/1/55069/Seattle/Fremont/Baguette-Box.html

These beautiful urls are slightly expensive to generate, since we have to “prettify” text by stripping whitespace and replacing accent characters. A few weeks back, I finally bit the bullet and started caching our pretty urls in the db instead of in memory. I lazily populate the url column for each restaurant, so we’re gradually filling in the data as users hit the server.

Then I dug into the code that generates our sitemap. For the uninitiated, a sitemap is an XML file describing every page on the server. Naturally, in order to generate this file we have to write out the pretty urls for each restaurant.

Of our ~100,000 restaurants, approximately ~27,000 hadn’t yet cached their pretty urls in the db. I naively used my lazy pretty url generator, which ended up sending 27,000 individual writes to the db. It took approximately 9 MINUTES to complete, with the CPU pegged the entire time.

It would be much better to do something like the following:

  1. Create a temp table with (id, url).
  2. Bulk insert to populate the temp table.
  3. Update the restaurants table from the temp table.

I implemented my new scheme and running time went from 9 minutes to 24 SECONDS. I liked this approach so much I decided to generalize it as ActiveRecord::Base.transform. Sample usage:

# if users don’t have names, give them a random one
NAMES = [‘Adam’, ‘Ethan’, ‘Patrick’]
User.transform(:name, :conditions => ‘name is null’).each do |i|
  i.name = NAMES[rand * NAMES.length]
end

This will use a bulk transform to update all users at once instead of each user individually.

Note that this has only been tested with Mysql, and is unlikely to work out of the box with other databases. Check it out:

# helper for quickly transforming an entire table using a temp table,
# a bulk insert, and an update
class ActiveRecord::Base
  def self.transform(cols, options = {})
    temp_name = options[:temp_name] || "temp_transform_table"
    temp_options = options[:temp_options] || "character set utf8 collate utf8_general_ci"

    # munge cols into real column objects
    cols = [cols] if !cols.is_a?(Array)
    cols = cols.map { |i| i.to_s }

    cols.delete("id")
    cols.unshift("id")

    cols = cols.map { |i| columns_hash[i] || raise("column #{i} not found") }

    # load/transform
    rows = find(:all, options)
    return if rows.empty?
    rows.each { |i| yield(i) }

    # create the temp table
    cols_create = cols.map { |i| "#{i.name} #{i.sql_type}" }
    connection.execute("CREATE TEMPORARY TABLE #{temp_name} (#{cols_create.join(’,')}) #{temp_options}")

    # bulk insert
    data = rows.map do |r|
      values = cols.map { |c| connection.quote(r[c.name], c) }
      "(#{values.join(’,')})"
    end
    connection.execute("INSERT INTO #{temp_name} values #{data.join(’,')}")

    # save
    cols_equal = cols.map { |i| "#{table_name}.#{i.name} = #{temp_name}.#{i.name}" }
    connection.execute("UPDATE #{table_name}, #{temp_name} SET #{cols_equal[1..-1].join(’, ‘)} WHERE #{cols_equal.first}")

    connection.execute("DROP TEMPORARY TABLE #{temp_name}")
  end
end