John’s Tidbits

Moo - Development, Trouble-shooting and Random thoughts…

Archive for the ‘Sysadmin’


Hardy, exim4, SMTP-AUTH and LDAP… (or debian openssl causes pain)

As most people will know yesterday caused a lot of people a lot of pain as they ran around replacing SSH keys and SSL certificates.

While running around fixing up all our servers, most of them in one felll swoop thanks to puppet, I realised two of our servers were still running Edgy. I figured it was high time I moved them to Hardy.

Everything went fairly smoothly with some minor hicups, except for SMTP-AUTH for exim. We use an ldap backed SMTP-AUTH and this just wouldn’t work after the upgrade. The following error was appearing in the logs.

ldap_search failed: -7, Bad search filter

This lead to hours upon hours of google searches, staring at debug messages and even at one stage resorting to using GDB. Eventually after staring at debug messages harder it twigged when I saw the following.

perform_ldap_search: ldapdn URL = "ldap:///ou=people,o=vquence?dn?sub?(uid=moo) "

Notice the space just before the closing double quote. It seems that the new openldap libraries don’t like errant spaces in your search filter.

Now to remember what I was doing yesterday morning before this whole derailment began.

Note: Before anyone comments I will completely deny that during these upgrades I did anything as silly as rm -rf `dpkg -L random-font-package`, no matter what twitter says.

Hardy and password locking

passwd -l root

In gutsy the above would simply lock the account by placing an ! in front of the passwd in your /etc/shadow file.

In hardy it now also sets the account as expired. Meaning you can’t ssh to it even if you have SSH keys in place.

Time to go and rebuild my EC2 AMI. :(

Update: To get the old behavour back you can do the following

passwd -l root
usermod -e "" root

Puppet, Facts and Certificates

I’m currently setting up Puppet at Vquence so that, among other things, we can deploy hosts into Amazon EC2 more easily.

To ensure a minimum setup time on a new server I wanted the setup to be as simple as

  • echo ‘DAEMON_OPTS=”-w 120 –fqdn newserver.vquence.com –server puppetmaster.vquence.com” > /etc/default/puppet
  • aptitude install puppet

This means that the puppet client will use newserver.vquence.com as the common name in the SSL certificate it creates for itself. On the puppet master the SSL cert name is then used to pick a node rather than the hostname reported by facter.

This means that I don’t need to worry about setting up /etc/hostname, even better /etc/hostname can be managed by puppet.

You can control this functionality on the puppet master by using the node_name option. From the docs

    # How the puppetmaster determines the client's identity
    # and sets the 'hostname' fact for use in the manifest, in particular
    # for determining which 'node' statement applies to the client.
    # Possible values are 'cert' (use the subject's CN in the client's
    # certificate) and 'facter' (use the hostname that the client
    # reported in its facts)
    # The default value is 'cert'.
    # node_name = cert

The problem was that the ‘hostname’ fact wasn’t being set. It looks like there was a regression in SVN#1673 when some refactoring was performed.

I’ve filed bug #1133 and you can clone my git repository.

I haven’t included any tests in the patch as I’m not sure how to. The master.rb test already tests this functionality but doesn’t test that the facts object has actually been changed. I think a test on getconfig is probably required but I’m not sure how you would access the facts after calling it.

Update: This patch is now in puppet as of 0.24.3.

Amazon EC2 ruby gem and large user_data

When you create an instance in EC2 you can send Amazon some user data that is accessible by your instance. At Vquence we use this to send a script that gets executes at boot up. This script contains some openvpn and puppet RSA keys so its approaching about 10k in size.

This works without any problems when using the java based command line tools. However I was getting the following error when using the EC2 Ruby GEM.

/usr/lib/ruby/1.8/net/protocol.rb:133:in `sysread': Connection reset by peer (Errno::ECONNRESET)
	from /usr/lib/ruby/1.8/net/protocol.rb:133:in `rbuf_fill'
	from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
	from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
	from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
	from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
	from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
	from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
	from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
	 ... 6 levels...
	from ./lib/ec2helpers.rb:43:in `start_instance'
	from ./ec2-puppet:107
	from ./ec2-puppet:89:in `each_pair'
	from ./ec2-puppet:89

Doing some tcpdumping indicated that after receiving the request Amazon waits for a while and then sends a TCP RESET. Not very nice at all. My next step was to use ngrep to compare the output from the command line tools and the ruby gem. This got nowhere fast since the command line tools use the SOAP API while the ruby gem uses the Query API.

What I did notice however is that while the command line tools performed a POST the ruby library performed a GET. At this stage I decided to test how much data I could send. So I started trying different user data sizes. The offending amount was around 7.8k, suspiciously close to exactly 8k.

The HTTP/1.1 spec doesn’t place an actual limit on the length but leaves it up to the server.

The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).


Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.

Apache for example limits this by default to 8190 bytes including the method and the protocol. You can change this using the LimitRequestLine directive.

I created a patch to modify the EC2 Gem to use a POST instead of a GET which has no such limitations. You can find the git tree for it at http://inodes.org/~johnf/git/amazon-ec2

Squid and Rails caching

At Vquence our Rails setup looks something like this.

------------     ---------     ------------
| Internet |---->| Squid |---->| Mongrels |
------------     ---------     ------------

(Who needs Inkscape when you have ASCII art)

This infrastructure is hosted in the US and up until recently squid hadn’t been doing much of anything except really sitting there.

Now a few months ago when we signed a contract with an Australian customer we decided we needed to place a squid cache in Australia which would actually cache content. For two reasons, firstly the US is a long way away and the 300ms latency is really noticeable and secondly because some of our pages involving graphs have long statistical calculations which can take minutes to render. (OK its really because no one has had a chance to optimise them yet but lets pretend that’s not the case). So we changed the above setup for the Australian customers to look like the following.

------------     ------------     ------------     ------------
| Internet |---->| Squid AU |---->| Squid US |---->| Mongrels |
------------     ------------     ------------     ------------

We hand out urls like http://www.client.b2b.vquence.com/widget to Australian customers and the rails backend is smart enough to make sure all the URLs look similar (I’ll blog about how I did that another time).

Without much time to look into thing properly I did some really nasty things on the AU squid cache to make sure it cached the pages.

refresh_pattern /client/graph  1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/static 1440    0%    1440    ignore-no-cache ignore-reload
refresh_pattern /client/video  1440    0%    1440    ignore-no-cache ignore-reload

This is evil, breaks a whole heap of RFCs but it did the trick and got us out of a bind quickly.

A few weeks ago I moved the production site to Rails 2.0, I noticed around this time that the caching had stopped working. The client was no longer using our services as their campaign had finished so it wasn’t an urgent concern.

It seems that Rails 2.0 goes one step further to ensure that caches don’t cache content and instead of just sending

Cache-Control: no-cache

it now sends

Cache-Control: private, max-age=0, must-revalidate

I tried adding ignore-private, since if you’re breaking some aspects of the RFC you may as well break a couple more, but squid still refused to cache the content. After struggling with this for a bit I decided that the universe was trying to tell me I should actually do things properly.

So with squid set back to its defaults I went exploring how to accomplish this. Google wasn’t all that helpful at first since most Rails caching articles talk about caching to static files as most sites don’t implement reverse proxying for caching. It turns out however its fairly simple. In the appropriate actions in your controllers simply do the following.

class VideoController < ApplicationController

    def vquence
        # Lots of code here

        expires_in 8.hours, :private => false
        render :template => “videos/vquence”
    end

end

This will send the following header and cache the page for 8 hours.

Cache-Control: max-age=28800

Now everything is much faster!!

Elastix and VMware

Took the plunge today to update my asterisk server. I’ve been using asterisk for about 5 years now and am pretty adept and manipulating its cryptic configuration files but I wanted to move to more of an appliance. I decided to give Elastix a try.

These days I virtualise all my boxes on a VMware Server environment. I got Elastix installed with no problems but then I wanted to get VMware Tools installed. This gives you better network drivers and make sure your clock stays in sync.

Since this requires you to compile some kernel modules you need to have the kernel-devel package installed so you can compile against your current kernel. This would normally be a simple matter of

yum install kernel-devel

However this seemed to do nothing. After a fair bit of investigation I worked out that Elastix ship there own kernel and modules for some asterisk specific hardware like zaptel and rhino. To make sure you don’t use the CentOS kernel they disable that package from that repository.

If you don’t particularly need the Elastix kernel (I don’t since this system will be pure VoIP) you can renable the CentOS modules by editing /etc/yum.repos.d/CentOS-Base.repo and commenting out ball the lines that look like

exclude=kernel*

Update: So it seems that this means that I won’t get the ztdummy module. This module uses the USB chipset to provide timing for some asterisk related things like the multi user conference module. I don’t really use this at the moment it’s not a big deal but I may have to roll my own kernel RPMs later down the track.

Ubuntu, VLANs and Bridges

Bridge and VLAN support has improved dramatically under Ubuntu and probably Debian as well since I last looked into it. once upon a time to create a bridge linked to a VLAN interface you would have to do horrible things like.

auto eth0
ifconfig eth0 inet manual
    pre-up /sbin/vconfig set_name_type VLAN_PLUS_VID_NO_PAD || true

auto vlan7
iface vlan7 inet manual
    pre-up /sbin/vconfig add eth0 7 || true
    post-down /sbin/vconfig rem vlan7 || true

auto br0
    pre-up brctl addbr br0
    pre-up brctl addif br0 vlan7
    post-down brctl delbr br0
    address 10.38.38.1
    netmask 255.255.255.0
    network 10.38.38.0
    broadcast 10.38.38.255

Now the bridge-utils and vlan packages provide hooks into the ifup and ifdown commands so you can simply do

auto br-vlan4
iface br-vlan4 inet static
    address 10.38.38.1
    netmask 255.255.255.0
    network 10.38.38.0
    broadcast 10.38.38.255
    vlan-raw-device eth1
    bridge_ports vlan4
    bridge_maxwait 0
    bridge_fd 0
    bridge_stp off

Which will automagically

  • Bring up eth1
  • Create vlan4 bound to the eth1 interface
  • Bring up vlan4
  • Create the br0 with vlan4 attached
  • Give eth1 the same HW address as br0
  • Bring up br0 with the IP address

Nifty!