Adding multiple database support to Cucumber

The Vqmetrics application needs to connect to two different databases. The first holds the videos, authors and their relevant statistics, while the second database holds the users, monitors and trackers.

We do this by specifying two databases in config/database.yml.

development:
  database: vqmetrics_devel
  < <: *login_dev_local

vqdata_development: &VQDATA_TEST
  database: vqdata_devel
  <<: *login_dev_local

So by default the vqmetrics_devel database will be used. When we need to specify a model where we need to connect to the vqdata_devel database we use

class Video < ActiveRecord::Base
  establish_connection "vqdata_#{RAILS_ENV}"
end

and for migrations that need to connect to this database we do the following.

class InitialSetup < ActiveRecord::Migration
  def self.connection
    Video.connection
  end
end

This setup works really well. However recently I moved this application to using Cucumber for testing. Tests worked fine the first time they are run but not the second time.

I discovered that the transaction on the second database where not being rolled back as they should be. Cucumber only sets up the first database for roll back by using

ActiveRecord::Base.connection

where it should be rolling them all back by looping through

ActiveRecord::Base.connection_handler.connection_pools.values.map {|pool| pool.connection}

I’ve filed a bug at lighthouseapp.

Squid and Rails caching

At Vquence our Rails setup looks something like this.

------------     ---------     ------------ 
| Internet |---->| Squid |---->| Mongrels | 
------------     ---------     ------------ 

(Who needs Inkscape when you have ASCII art)

This infrastructure is hosted in the US and up until recently squid hadn’t been doing much of anything except really sitting there.

Now a few months ago when we signed a contract with an Australian customer we decided we needed to place a squid cache in Australia which would actually cache content. For two reasons, firstly the US is a long way away and the 300ms latency is really noticeable and secondly because some of our pages involving graphs have long statistical calculations which can take minutes to render. (OK its really because no one has had a chance to optimise them yet but lets pretend that’s not the case). So we changed the above setup for the Australian customers to look like the following.

------------     ------------     ------------     ------------
| Internet |---->| Squid AU |---->| Squid US |---->| Mongrels |
------------     ------------     ------------     ------------

We hand out urls like http://www.client.b2b.vquence.com/widget to Australian customers and the rails backend is smart enough to make sure all the URLs look similar (I’ll blog about how I did that another time).

Without much time to look into thing properly I did some really nasty things on the AU squid cache to make sure it cached the pages.

refresh_pattern /client/graph  1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/static 1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/video  1440    0%    1440    ignore-no-cache ignore-reload

This is evil, breaks a whole heap of RFCs but it did the trick and got us out of a bind quickly.

A few weeks ago I moved the production site to Rails 2.0, I noticed around this time that the caching had stopped working. The client was no longer using our services as their campaign had finished so it wasn’t an urgent concern.

It seems that Rails 2.0 goes one step further to ensure that caches don’t cache content and instead of just sending

Cache-Control: no-cache

it now sends

Cache-Control: private, max-age=0, must-revalidate

I tried adding ignore-private, since if you’re breaking some aspects of the RFC you may as well break a couple more, but squid still refused to cache the content. After struggling with this for a bit I decided that the universe was trying to tell me I should actually do things properly.

So with squid set back to its defaults I went exploring how to accomplish this. Google wasn’t all that helpful at first since most Rails caching articles talk about caching to static files as most sites don’t implement reverse proxying for caching. It turns out however its fairly simple. In the appropriate actions in your controllers simply do the following.

class VideoController  false
        render :template => "videos/vquence"
    end

end

This will send the following header and cache the page for 8 hours.

Cache-Control: max-age=28800

Now everything is much faster!!

Rails, ActiveRecord, MySQL, GUIDs and the rename_column bug

Since I wasted over 4 hours of my life today working my way through this problem I feel the need to share.

Since it seems to be the in thing in the Web 2.0 space, just to be cool, we use GUIDs to identify different objects in our URLs at Vquence. For example my randomly created vquence on on Rails has a GUID of

cDuIhGWb8r3lDxaby-aaea

Andy Singleton has written a rails plugin called funnily enough guid. This allows you to do the following in your model.

class Vquence < ActiveRecord::Base
  usesguid :column => 'guid'
end

Once you do this you will automatically get GUID looking identifiers in the db and your application. The guid column in the DB gets mapped to Vquence.id so you can do things like

Vquence.find('cDuIhGWb8r3lDxaby-aaea');

We used to use Sphinx as our search index, we now use Lucene. Sphinx requires that you have an integer id for each document in your index. This is to make your SQL queries much faster. The dumb way to create your index is to use queries like the following.

SELECT * FROM videos LIMIT 0,10000
SELECT * FROM videos LIMIT 10000,10000
...
SELECT * FROM videos LIMIT 990000,10000

I know this as its what we originally used with Lucene. This works fine until you reach about 1,000,000 rows. The problem is that since there is no implicit ordering or range in the above query it means that for the final query MySQL needs to workout what the first 1,000,000 rows are and then return you the last 10,000.

A much better way to do it is the following

SELECT * FROM videos WHERE integer_id >= 1 and integer_id < = 10000
SELECT * FROM videos WHERE integer_id >= 10001 and integer_id < = 20000
...
SELECT * FROM videos WHERE integer_id >= 990000 and integer_id < = 1000000

This is fast as long as integer_id is indexed.

So to accommodate this in Rails we began using migrations like the following.

class Videos < ActiveRecord::Migration
  def self.up
    create_table :videos do |t|
      t.column :uuid, :string, :limit =>22, :null => false
      ...

      t.timestamps
    end
    add_index :videos, :uuid, :unique => true
    rename_column :videos, :id, :integer_id
  end

  def self.down
    drop_table :videos
  end
end

This was all done months ago and the repercussions didn’t rear their ugly head until today. Previously everything in the videos table had been created by our external crawler and Rails never needed to insert into the table. Today I wrote some code that inserted into the videos table and everything broke horribly.

The problem is that ActiveRecord can still see the integer_id field and tries to insert a 0 value into it. It isn’t clever enough to realise that it is an auto increment field and to leave it alone. After some help from bitsweat on #RoR I implemented a dirty hack to hide the integer_id column from ActiveRecord. Thanks to Ruby overriding the ActiveRecord internals is really easy and I added the following to our guid plugin.

  # HACK (JF) - This is too evil to even blog about
  # When we use guid as a primary key we usually rename the original 'id'
  # field to 'integer_id'. We need to hide this from rails so it doesn't
  # mess with it. WARNING: This means once you use usesguid anywhere you can
  # never access a column in any table anywhere called 'integer_id'

class ActiveRecord::Base
  private
    alias :original_attributes_with_quotes :attributes_with_quotes

    def attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
      quoted = original_attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
      quoted.delete('integer_id')
      quoted
    end
end

So this worked like a charm and after 4 hours I thought my pain was over, but then I tried to add second row to my test database. This resulted in the following.

 Mysql::Error: Duplicate entry '0' for key 1: INSERT INTO `videos` (`updated_at`, `sort_order`, `guid`, `description`,
 `user_id`, `created_at`) VALUES('2008-01-11 16:45:05', NULL, 'bcOMPqWaGr3k5CabxfFyeK', '', 5, '2008-01-11 16:44:28');

I ran the same SQL with MySQL client and got the same error. I then looked at the table and saw the following

mysql> show columns from moo;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| integer_id | int(11)     | NO   | PRI | 0       |       |
| guid       | varchar(22) | NO   | UNI |         |       |
+------------+-------------+------+-----+---------+-------+

What I expected to see was

mysql> show columns from moo;
+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| integer_id | int(11)     | NO   | PRI | NULL    | auto_increment |
| guid       | varchar(22) | NO   | UNI |         |                |
+------------+-------------+------+-----+---------+----------------+

The difference is that when the column was renamed it seems to have lost its auto increment and NOT NULL properties. Some investigation showed that the SQL being used to rename the column was

ALTER TABLE `videos` CHANGE `id` `integer_id` int(11)

when it should be

ALTER TABLE `videos` CHANGE `id` `integer_id` int(11) NOT NULL AUTO_INCREMENT

It seems that this is already filled as a bug on the rails site, including a patch.

Funnily enough that bug is owned by bitsweat. It seems he’s managed to help me out twice in one day đŸ™‚ It doesn’t seem that it made it into Rails 2.0 though so until then be careful about renaming columns using migrations.

Mongrel, rails and the theory of relativity

Summary (E = mc&sup2;)

When using mongrel for rails and you want to deploy an app under /other_url then use

    ActionController::AbstractRequest.relative_url_root = "/other_url"

in config/environments/production.rb instead of

    ENV['RAILS_RELATIVE_URL_ROOT'] = "/other_url"

Proof (From first principals)

At Vquence we have a pretty standard rails setup

  • Apache with mod_proxy
  • pen
  • mongrel

Silvia recently wrote an application to allow us to edit the news articles posted to our corporate website. I wanted to do something I thought would be pretty simple, have the application appear at /news on our admin web server.

Step one was the obvious change to mod_proxy

    ProxyPass /news http://localhost:8000
    ProxyPassReverse /news http://localhost:8000

Of course the problem is that the rails app still thinks it is living on / so it returns URLs like /stylesheets/moo.css instead of /news/stylesheets/moo.css.

A bit of googling found a few email threads with a common solution. In your environment.rb set

    ENV['RAILS_RELATIVE_URL_ROOT'] = "/other_url"

This is where things fell apart fairly quickly. I could not get this to work no matter what I tried. After a few hours of following a HTTP request through the whole Mongrel and rails stack I discovered the following.

Setting RAILS_RELATIVE_ROOT will work fine if you are running rails using CGI. For the simple reason, which should have been more obvious to me sooner, that CGIs use environment variables to access their parameters. This can be seen in the
ruby CGI class

/usr/lib/ruby/1.8/cgi.rb:


class CGI

def env_table
    ENV
end

However mongrel overloads env_table and does the following instead

/usr/lib/ruby/1.8/mongrel/cgi.rb:


class CGIWrapper < ::CGI

    # Used to wrap the normal env_table variable used inside CGI.
    def env_table
        @request.params
    end

This makes sense since the rails code is now running inside the web server so environment variables aren’t necessary. Upon investigation I found that the URL morphing magic is performed with rails as follows.

/usr/share/rails/actionpack/lib/action_controller/request.rb:


  class AbstractRequest
    cattr_accessor :relative_url_root
    
    # Returns the path minus the web server relative installation directory.
    # This can be set with the environment variable RAILS_RELATIVE_URL_ROOT.
    # It can be automatically extracted for Apache setups. If the server is not
    # Apache, this method returns an empty string.
    def relative_url_root
      @@relative_url_root ||= case
        when @env["RAILS_RELATIVE_URL_ROOT"]
          @env["RAILS_RELATIVE_URL_ROOT"]
        when server_software == 'apache'
          @env["SCRIPT_NAME"].to_s.sub(//dispatch.(fcgi|rb|cgi)$/, '')
        else
          ''
      end
    end

What this all means is that you can solve the whole problem by placing the following in your config/environments/production.rb

    ActionController::AbstractRequest.relative_url_root = "/other_url"

Now if only Einstein had put his theories to good use and invented a time machine then maybe I could get the last 4 hours of my life back đŸ™‚

Update: Make sure /other_url isn’t the same name as one of your controllers or bad things happen.