John’s Tidbits

Moo - Development, Trouble-shooting and Random thoughts…

Amazon EC2 ruby gem and large user_data

When you create an instance in EC2 you can send Amazon some user data that is accessible by your instance. At Vquence we use this to send a script that gets executes at boot up. This script contains some openvpn and puppet RSA keys so its approaching about 10k in size.

This works without any problems when using the java based command line tools. However I was getting the following error when using the EC2 Ruby GEM.

/usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET)
	from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
	from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
	from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
	from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
	from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
	from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
	from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
	from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
	 ... 6 levels...
	from ./lib/ec2helpers.rb:43:in `start_instance'
	from ./ec2-puppet:107
	from ./ec2-puppet:89:in `each_pair'
	from ./ec2-puppet:89

Doing some tcpdumping indicated that after receiving the request Amazon waits for a while and then sends a TCP RESET. Not very nice at all. My next step was to use ngrep to compare the output from the command line tools and the ruby gem. This got nowhere fast since the command line tools use the SOAP API while the ruby gem uses the Query API.

What I did notice however is that while the command line tools performed a POST the ruby library performed a GET. At this stage I decided to test how much data I could send. So I started trying different user data sizes. The offending amount was around 7.8k, suspiciously close to exactly 8k.

The HTTP/1.1 spec doesn’t place an actual limit on the length but leaves it up to the server.

The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).


Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.

Apache for example limits this by default to 8190 bytes including the method and the protocol. You can change this using the LimitRequestLine directive.

I created a patch to modify the EC2 Gem to use a POST instead of a GET which has no such limitations. You can find the git tree for it at http://inodes.org/~johnf/git/amazon-ec2

EC2UI extension for Firefox 3

I’ve been doing some work with Amazon EC2 the last few days. An invaluable tool is the EC2UI firefox extension that Amazon have written. This provides you with a simple GUI inside the firefox chrome which makes it really easy to manipulate your EC2 instances.

A few weeks ago Hardy moved to using firefox 3. This meant, amongst other things, that the amazon plugin stopped working. The firefox guys have a webpage up that explains how to update extensions for Firefox 3.

The main problem was with changes to the password manager. You can find my changes on my bzr branch and a packaged up version of the extension EC2UI for Firefox 3.0b4.

Update: See comments below for new versions

Squid and Rails caching

At Vquence our Rails setup looks something like this.

------------     ---------     ------------
| Internet |---->| Squid |---->| Mongrels |
------------     ---------     ------------

(Who needs Inkscape when you have ASCII art)

This infrastructure is hosted in the US and up until recently squid hadn’t been doing much of anything except really sitting there.

Now a few months ago when we signed a contract with an Australian customer we decided we needed to place a squid cache in Australia which would actually cache content. For two reasons, firstly the US is a long way away and the 300ms latency is really noticeable and secondly because some of our pages involving graphs have long statistical calculations which can take minutes to render. (OK its really because no one has had a chance to optimise them yet but lets pretend that’s not the case). So we changed the above setup for the Australian customers to look like the following.

------------     ------------     ------------     ------------
| Internet |---->| Squid AU |---->| Squid US |---->| Mongrels |
------------     ------------     ------------     ------------

We hand out urls like http://www.client.b2b.vquence.com/widget to Australian customers and the rails backend is smart enough to make sure all the URLs look similar (I’ll blog about how I did that another time).

Without much time to look into thing properly I did some really nasty things on the AU squid cache to make sure it cached the pages.

refresh_pattern /client/graph  1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/static 1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/video  1440    0%    1440    ignore-no-cache ignore-reload

This is evil, breaks a whole heap of RFCs but it did the trick and got us out of a bind quickly.

A few weeks ago I moved the production site to Rails 2.0, I noticed around this time that the caching had stopped working. The client was no longer using our services as their campaign had finished so it wasn’t an urgent concern.

It seems that Rails 2.0 goes one step further to ensure that caches don’t cache content and instead of just sending

Cache-Control: no-cache

it now sends

Cache-Control: private, max-age=0, must-revalidate

I tried adding ignore-private, since if you’re breaking some aspects of the RFC you may as well break a couple more, but squid still refused to cache the content. After struggling with this for a bit I decided that the universe was trying to tell me I should actually do things properly.

So with squid set back to its defaults I went exploring how to accomplish this. Google wasn’t all that helpful at first since most Rails caching articles talk about caching to static files as most sites don’t implement reverse proxying for caching. It turns out however its fairly simple. In the appropriate actions in your controllers simply do the following.

class VideoController < ApplicationController

    def vquence
        # Lots of code here

        expires_in 8.hours, :private => false
        render :template => “videos/vquence”
    end

end

This will send the following header and cache the page for 8 hours.

Cache-Control: max-age=28800

Now everything is much faster!!

Google Analytics new tracking code

As I was setting up a new site in Google Analytics today I discovered that there is now new tracking code you are supposed to use. The old code is now unsupported. Your javascript snippet should now look something like.



This has been a community service announcement. :)

Rails, ActiveRecord, MySQL, GUIDs and the rename_column bug

Since I wasted over 4 hours of my life today working my way through this problem I feel the need to share.

Since it seems to be the in thing in the Web 2.0 space, just to be cool, we use GUIDs to identify different objects in our URLs at Vquence. For example my randomly created vquence on on Rails has a GUID of

cDuIhGWb8r3lDxaby-aaea

Andy Singleton has written a rails plugin called funnily enough guid. This allows you to do the following in your model.

class Vquence < ActiveRecord::Base
  usesguid :column => ‘guid’
end

Once you do this you will automatically get GUID looking identifiers in the db and your application. The guid column in the DB gets mapped to Vquence.id so you can do things like

Vquence.find('cDuIhGWb8r3lDxaby-aaea');

We used to use Sphinx as our search index, we now use Lucene. Sphinx requires that you have an integer id for each document in your index. This is to make your SQL queries much faster. The dumb way to create your index is to use queries like the following.

SELECT * FROM videos LIMIT 0,10000
SELECT * FROM videos LIMIT 10000,10000
...
SELECT * FROM videos LIMIT 990000,10000

I know this as its what we originally used with Lucene. This works fine until you reach about 1,000,000 rows. The problem is that since there is no implicit ordering or range in the above query it means that for the final query MySQL needs to workout what the first 1,000,000 rows are and then return you the last 10,000.

A much better way to do it is the following

SELECT * FROM videos WHERE integer_id >= 1 and integer_id < = 10000
SELECT * FROM videos WHERE integer_id >= 10001 and integer_id < = 20000
...
SELECT * FROM videos WHERE integer_id >= 990000 and integer_id < = 1000000

This is fast as long as integer_id is indexed.

So to accommodate this in Rails we began using migrations like the following.

class Videos < ActiveRecord::Migration
  def self.up
    create_table :videos do |t|
      t.column :uuid, :string, :limit =>22, :null => false
      …

      t.timestamps
    end
    add_index :videos, :uuid, :unique => true
    rename_column :videos, :id, :integer_id
  end

  def self.down
    drop_table :videos
  end
end

This was all done months ago and the repercussions didn’t rear their ugly head until today. Previously everything in the videos table had been created by our external crawler and Rails never needed to insert into the table. Today I wrote some code that inserted into the videos table and everything broke horribly.

The problem is that ActiveRecord can still see the integer_id field and tries to insert a 0 value into it. It isn’t clever enough to realise that it is an auto increment field and to leave it alone. After some help from bitsweat on #RoR I implemented a dirty hack to hide the integer_id column from ActiveRecord. Thanks to Ruby overriding the ActiveRecord internals is really easy and I added the following to our guid plugin.

  # HACK (JF) - This is too evil to even blog about
  # When we use guid as a primary key we usually rename the original 'id'
  # field to 'integer_id'. We need to hide this from rails so it doesn't
  # mess with it. WARNING: This means once you use usesguid anywhere you can
  # never access a column in any table anywhere called 'integer_id'

class ActiveRecord::Base
  private
    alias :original_attributes_with_quotes :attributes_with_quotes

    def attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
      quoted = original_attributes_with_quotes(include_primary_key = true, include_readonly_attributes = true)
      quoted.delete('integer_id')
      quoted
    end
end

So this worked like a charm and after 4 hours I thought my pain was over, but then I tried to add second row to my test database. This resulted in the following.

 Mysql::Error: Duplicate entry '0' for key 1: INSERT INTO `videos` (`updated_at`, `sort_order`, `guid`, `description`,
 `user_id`, `created_at`) VALUES('2008-01-11 16:45:05', NULL, 'bcOMPqWaGr3k5CabxfFyeK', '', 5, '2008-01-11 16:44:28');

I ran the same SQL with MySQL client and got the same error. I then looked at the table and saw the following

mysql> show columns from moo;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| integer_id | int(11)     | NO   | PRI | 0       |       |
| guid       | varchar(22) | NO   | UNI |         |       |
+------------+-------------+------+-----+---------+-------+

What I expected to see was

mysql> show columns from moo;
+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| integer_id | int(11)     | NO   | PRI | NULL    | auto_increment |
| guid       | varchar(22) | NO   | UNI |         |                |
+------------+-------------+------+-----+---------+----------------+

The difference is that when the column was renamed it seems to have lost its auto increment and NOT NULL properties. Some investigation showed that the SQL being used to rename the column was

ALTER TABLE `videos` CHANGE `id` `integer_id` int(11)

when it should be

ALTER TABLE `videos` CHANGE `id` `integer_id` int(11) NOT NULL AUTO_INCREMENT

It seems that this is already filled as a bug on the rails site, including a patch.

Funnily enough that bug is owned by bitsweat. It seems he’s managed to help me out twice in one day :) It doesn’t seem that it made it into Rails 2.0 though so until then be careful about renaming columns using migrations.

Elastix and VMware

Took the plunge today to update my asterisk server. I’ve been using asterisk for about 5 years now and am pretty adept and manipulating its cryptic configuration files but I wanted to move to more of an appliance. I decided to give Elastix a try.

These days I virtualise all my boxes on a VMware Server environment. I got Elastix installed with no problems but then I wanted to get VMware Tools installed. This gives you better network drivers and make sure your clock stays in sync.

Since this requires you to compile some kernel modules you need to have the kernel-devel package installed so you can compile against your current kernel. This would normally be a simple matter of

yum install kernel-devel

However this seemed to do nothing. After a fair bit of investigation I worked out that Elastix ship there own kernel and modules for some asterisk specific hardware like zaptel and rhino. To make sure you don’t use the CentOS kernel they disable that package from that repository.

If you don’t particularly need the Elastix kernel (I don’t since this system will be pure VoIP) you can renable the CentOS modules by editing /etc/yum.repos.d/CentOS-Base.repo and commenting out ball the lines that look like

exclude=kernel*

Update: So it seems that this means that I won’t get the ztdummy module. This module uses the USB chipset to provide timing for some asterisk related things like the multi user conference module. I don’t really use this at the moment it’s not a big deal but I may have to roll my own kernel RPMs later down the track.

linux.conf.au 2008 selling out

It’s 15:11pm and there are only 11 tickets left for linux.conf.au.

WARNING: If you have registered but haven’t gotten around to paying yet then you are going to miss out.

So hop to it. Otherwise you are going to be a sad panda.

Sad Banda