Turn Off Rails Sessions for Robots
Urbanspoon is already attracting a sizable amount of traffic, and we expect our numbers to grow rapidly now that we’ve launched Chicago and New York. Urbanspoon is regularly crawled by a large number of robots seeking to index our site.
Some of our pages squirrel information away inside the Rails session. For example, we keep track of recently visited restaurants so that we can guide users back to those restaurants when they return. This is handy if, for example, you always order pizza from one or two restaurants.
Imagine if Googlebot crawled each of our 35,000 restaurants each day. Each time the bot hits a restaurant we would attempt to record a “restaurant visit” in the session. Since robots generally don’t use cookies, that would create 35,000 useless sessions each day. Wouldn’t it be nice to suppress these sessions entirely?
I wrote a helper function called is_megatron? to detect if a request’s User-Agent indicates that the request is from a robot. The regular expression catches most of the bot traffic that hits our site:
class Util def Util.is_megatron?(user_agent) user_agent =~ /\b(Baidu|Gigabot|Googlebot|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)\b/i end end
If we determine that a request appears to be from a robot, we simply disable session support for the current request:
class ApplicationController < ActionController::Base # turn off sessions if this is a request from a robot session :off, :if => proc { |request| Util.is_megatron?(request.user_agent) } … end
Monday, November 26th 2007 at 3:17 am
I have implemented the is_megatron? method directly on the ApplicationController. This way, I can simply call session :off, :if => :is_megatron? which seems cleaner.