ActiveRecord Table Transform (or, how to write to the db 27,000 times)
Tuesday, July 31st, 2007At Urbanspoon, we use pretty urls for our pages to make them more palatable to users and search engines. Here’s an example:
http://www.urbanspoon.com/r/1/55069/Seattle/Fremont/Baguette-Box.html
These beautiful urls are slightly expensive to generate, since we have to “prettify” text by stripping whitespace and replacing accent characters. A few weeks back, I finally bit the bullet and started caching our pretty urls in the db instead of in memory. I lazily populate the url column for each restaurant, so we’re gradually filling in the data as users hit the server.
Then I dug into the code that generates our sitemap. For the uninitiated, a sitemap is an XML file describing every page on the server. Naturally, in order to generate this file we have to write out the pretty urls for each restaurant.
Of our ~100,000 restaurants, approximately ~27,000 hadn’t yet cached their pretty urls in the db. I naively used my lazy pretty url generator, which ended up sending 27,000 individual writes to the db. It took approximately 9 MINUTES to complete, with the CPU pegged the entire time.
It would be much better to do something like the following:
- Create a temp table with (id, url).
- Bulk insert to populate the temp table.
- Update the restaurants table from the temp table.
I implemented my new scheme and running time went from 9 minutes to 24 SECONDS. I liked this approach so much I decided to generalize it as ActiveRecord::Base.transform. Sample usage:
# if users don’t have names, give them a random one NAMES = [‘Adam’, ‘Ethan’, ‘Patrick’] User.transform(:name, :conditions => ‘name is null’).each do |i| i.name = NAMES[rand * NAMES.length] end
This will use a bulk transform to update all users at once instead of each user individually.
Note that this has only been tested with Mysql, and is unlikely to work out of the box with other databases. Check it out:
# helper for quickly transforming an entire table using a temp table, # a bulk insert, and an update class ActiveRecord::Base def self.transform(cols, options = {}) temp_name = options[:temp_name] || "temp_transform_table" temp_options = options[:temp_options] || "character set utf8 collate utf8_general_ci" # munge cols into real column objects cols = [cols] if !cols.is_a?(Array) cols = cols.map { |i| i.to_s } cols.delete("id") cols.unshift("id") cols = cols.map { |i| columns_hash[i] || raise("column #{i} not found") } # load/transform rows = find(:all, options) return if rows.empty? rows.each { |i| yield(i) } # create the temp table cols_create = cols.map { |i| "#{i.name} #{i.sql_type}" } connection.execute("CREATE TEMPORARY TABLE #{temp_name} (#{cols_create.join(’,')}) #{temp_options}") # bulk insert data = rows.map do |r| values = cols.map { |c| connection.quote(r[c.name], c) } "(#{values.join(’,')})" end connection.execute("INSERT INTO #{temp_name} values #{data.join(’,')}") # save cols_equal = cols.map { |i| "#{table_name}.#{i.name} = #{temp_name}.#{i.name}" } connection.execute("UPDATE #{table_name}, #{temp_name} SET #{cols_equal[1..-1].join(’, ‘)} WHERE #{cols_equal.first}") connection.execute("DROP TEMPORARY TABLE #{temp_name}") end end