Elastix and VMware

Took the plunge today to update my asterisk server. I’ve been using asterisk for about 5 years now and am pretty adept and manipulating its cryptic configuration files but I wanted to move to more of an appliance. I decided to give Elastix a try.

These days I virtualise all my boxes on a VMware Server environment. I got Elastix installed with no problems but then I wanted to get VMware Tools installed. This gives you better network drivers and make sure your clock stays in sync.

Since this requires you to compile some kernel modules you need to have the kernel-devel package installed so you can compile against your current kernel. This would normally be a simple matter of

yum install kernel-devel

However this seemed to do nothing. After a fair bit of investigation I worked out that Elastix ship there own kernel and modules for some asterisk specific hardware like zaptel and rhino. To make sure you don’t use the CentOS kernel they disable that package from that repository.

If you don’t particularly need the Elastix kernel (I don’t since this system will be pure VoIP) you can renable the CentOS modules by editing /etc/yum.repos.d/CentOS-Base.repo and commenting out ball the lines that look like

exclude=kernel*

Update: So it seems that this means that I won’t get the ztdummy module. This module uses the USB chipset to provide timing for some asterisk related things like the multi user conference module. I don’t really use this at the moment it’s not a big deal but I may have to roll my own kernel RPMs later down the track.

Ubuntu, VLANs and Bridges

Bridge and VLAN support has improved dramatically under Ubuntu and probably Debian as well since I last looked into it. once upon a time to create a bridge linked to a VLAN interface you would have to do horrible things like.

auto eth0
ifconfig eth0 inet manual
    pre-up /sbin/vconfig set_name_type VLAN_PLUS_VID_NO_PAD || true

auto vlan7
iface vlan7 inet manual
    pre-up /sbin/vconfig add eth0 7 || true
    post-down /sbin/vconfig rem vlan7 || true

auto br0
    pre-up brctl addbr br0
    pre-up brctl addif br0 vlan7
    post-down brctl delbr br0
    address 10.38.38.1
    netmask 255.255.255.0
    network 10.38.38.0
    broadcast 10.38.38.255

Now the bridge-utils and vlan packages provide hooks into the ifup and ifdown commands so you can simply do

auto br-vlan4
iface br-vlan4 inet static
    address 10.38.38.1
    netmask 255.255.255.0
    network 10.38.38.0
    broadcast 10.38.38.255
    vlan-raw-device eth1
    bridge_ports vlan4
    bridge_maxwait 0
    bridge_fd 0
    bridge_stp off

Which will automagically

  • Bring up eth1
  • Create vlan4 bound to the eth1 interface
  • Bring up vlan4
  • Create the br0 with vlan4 attached
  • Give eth1 the same HW address as br0
  • Bring up br0 with the IP address

Nifty!

Mongrel, rails and the theory of relativity

Summary (E = mc²)

When using mongrel for rails and you want to deploy an app under /other_url then use

    ActionController::AbstractRequest.relative_url_root = "/other_url"

in config/environments/production.rb instead of

    ENV['RAILS_RELATIVE_URL_ROOT'] = "/other_url"

Proof (From first principals)

At Vquence we have a pretty standard rails setup

  • Apache with mod_proxy
  • pen
  • mongrel

Silvia recently wrote an application to allow us to edit the news articles posted to our corporate website. I wanted to do something I thought would be pretty simple, have the application appear at /news on our admin web server.

Step one was the obvious change to mod_proxy

    ProxyPass /news http://localhost:8000
    ProxyPassReverse /news http://localhost:8000

Of course the problem is that the rails app still thinks it is living on / so it returns URLs like /stylesheets/moo.css instead of /news/stylesheets/moo.css.

A bit of googling found a few email threads with a common solution. In your environment.rb set

    ENV['RAILS_RELATIVE_URL_ROOT'] = "/other_url"

This is where things fell apart fairly quickly. I could not get this to work no matter what I tried. After a few hours of following a HTTP request through the whole Mongrel and rails stack I discovered the following.

Setting RAILS_RELATIVE_ROOT will work fine if you are running rails using CGI. For the simple reason, which should have been more obvious to me sooner, that CGIs use environment variables to access their parameters. This can be seen in the
ruby CGI class

/usr/lib/ruby/1.8/cgi.rb:


class CGI

def env_table
    ENV
end

However mongrel overloads env_table and does the following instead

/usr/lib/ruby/1.8/mongrel/cgi.rb:


class CGIWrapper < ::CGI

    # Used to wrap the normal env_table variable used inside CGI.
    def env_table
        @request.params
    end

This makes sense since the rails code is now running inside the web server so environment variables aren’t necessary. Upon investigation I found that the URL morphing magic is performed with rails as follows.

/usr/share/rails/actionpack/lib/action_controller/request.rb:


  class AbstractRequest
    cattr_accessor :relative_url_root
    
    # Returns the path minus the web server relative installation directory.
    # This can be set with the environment variable RAILS_RELATIVE_URL_ROOT.
    # It can be automatically extracted for Apache setups. If the server is not
    # Apache, this method returns an empty string.
    def relative_url_root
      @@relative_url_root ||= case
        when @env["RAILS_RELATIVE_URL_ROOT"]
          @env["RAILS_RELATIVE_URL_ROOT"]
        when server_software == 'apache'
          @env["SCRIPT_NAME"].to_s.sub(//dispatch.(fcgi|rb|cgi)$/, '')
        else
          ''
      end
    end

What this all means is that you can solve the whole problem by placing the following in your config/environments/production.rb

    ActionController::AbstractRequest.relative_url_root = "/other_url"

Now if only Einstein had put his theories to good use and invented a time machine then maybe I could get the last 4 hours of my life back đŸ™‚

Update: Make sure /other_url isn’t the same name as one of your controllers or bad things happen.

TCP Window Scaling and kernel 2.6.17+

So I was tearing my hair out today. I’d installed Ubuntu onto a new Sun X4200 so that I could migrate Bulletproof’s monitoring system to it. (Note you need to use edgy knot-1 for the SAS drives to be supported). Anyway as I was installing packages I was getting speeds like 10kB/s. Normally I would expect 800-1000kB/s.

I did the usual sort of debugging, where there any errors on the switch, was it affecting other servers on the same network etc etc. Everything looked fine. Our friend tcpdump showed a dump that looked something like this.


root@oldlace:~# tcpdump -ni bond0 port 80
tcpdump: listening on bond0
1.2.3.4.42501 > 203.16.234.85.80: S 0:0 win 5840 <mss 1460,sackOK,timestamp 94318 0,nop,wscale 6> (DF)
203.16.234.85.80 > 1.2.3.4.42501: S 0:0(0) ack 1 win 5840<mss 1460,nop,wscale 2> (DF)
1.2.3.4.42501 > 203.16.234.85.80: . ack 1 win 92 (DF)
1.2.3.4.42501 > 203.16.234.85.80: P 1:352(351) ack 1 win 92 (DF)
203.16.234.85.80 > 1.2.3.4.42501: . ack 352 win 1608 (DF)

You’ll notice that the server initially advertises a window size of 5840, then suddenly in the first ACK it is advertising a size of 92. This means that the other side can only send 92 bytes before waiting for an ACK!!! Not very conducive to quick WAN transfer speeds.

After a lot of Google searching I discovered these threads on LKLM

Of course what I was missing was the wscale 6, which means that the windows was actually 92*2^6 = 5888. Which is pretty close to 5840 so why bother with the scaling, because towards the end of the connection we get 16022*2^6 = 1025408 which doesn’t normally fit into a TCP header.

So why aren’t things screaming along with this massive window, well something in the middle doesn’t like a windows scaling factor of 6 and is resetting it to zero. Which means the other end thingk the windows size really is 92.

There are 2 quick fixes. First you can simply turn off windows scaling all together by doing

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

but that limits your window to 64k. Or you can limit the size of your TCP buffers back to pre 2.6.17 kernel values which means a wscale value of about 2 is used which is acceptable to most broken routers.

echo "4096 16384 131072" > /proc/sys/net/ipv4/tcp_wmem
echo "4096 87380 174760" > /proc/sys/net/ipv4/tcp_rmem

The original values would have had 4MB in the last column above which is what was allowing these massive windows.

In a thread somewhere which I can’t find anymore Dave Miller had a great quote along the lines of

“I refuse to workaround it, window scaling has been part of the protocol since 1999, deal with it.”

VMware Consolidated Backup

The last few months have seen me working at an insane pace at Bulletproof in the lead up to a launch of our latest and greatest product Dedicated Virtual Machine Hosting or DVMH for short. I’ll ramble on a bit more about it after it’s launched but basically it is similar to our existing Managed Dedicated Hosting but running on VMware and with a whole heap of cool features due to the benefits of virtualisation.

Today saw me working with one of these cool features, Consolidated Backup. Basically what this lets you do is have a Windows 2003 server directly plugged into the SAN which can directly see all the VM images sitting in the VMFS LUNs. It then talks to the ESX servers takes a snapshot and makes a copy of it t local disk. Hey presto Disaster Recovery. Well mostly anyway, the restoration aspect isn’t all that crash hot as you’ll see below.

Documentation on performing the backups is a bit scarce. VMware provide some scripts that let you tie it in to some commercial backup products like Legato, Veritas and NetBackup but no real docs on how to do it yourself.

So here are some quick examples. (You can find all these commands in C:Program FilesVMwareVMware Consolidated Backup Framework

Getting a list of VMs on your ESX farm.
[code]
vcbVmName.exe -h VC_HOST -u USERNAME -p PASSWORD -s any:
[/code]

Backing up a VM
[code]
vcbMounter.exe -h VC_HOST -u USERNAME -p PASSWORD -a moref:MOREF -r DESTINATION -t fullvm -m san
[/code]
where MOREF comes from the list you created above and DESTINATION is a local path on your VCB proxy.

You should then strictly unmount it by doing
[code]
vcbMounter.exe -d DESTINATION
[/code]
but I don’t think this does anymore than delete the files, since the snapshot on the ESX server has already been closed.

The above creates something like this
[code]
catalog
MyVM.nvram
MyVM.vmx
scsi0-0-0-MyVM-s001.vmdk
scsi0-0-0-MyVM-s002.vmdk
scsi0-0-0-MyVM-s003.vmdk
scsi0-0-0-MyVM-s004.vmdk
scsi0-0-0-MyVM-s005.vmdk
scsi0-0-0-MyVM.vmdk
unmount.dat
vmware-1.log
vmware-2.log
vmware-3.log
vmware-4.log
vmware-5.log
vmware.log
[/code]

Mounting a VM image locally
[code]
vmmount.exe -d VMDK -cycleId -sysdl LOCATION
[/code]
VMDK needs to be scsi0-0-0-MyVM.vmdk from above.

You can then unmount it by doing
[code]
vmount.exe -u LOCATION
[/code]

This is nice and easy and really useful means you can now easily backup everything to tape.

Recovery is another matter entirely, apparently in the Beta releases vcbRestore was distributed with Consolidated Backup but in the final release it now only exists on the ESX servers. So you need to move your directory above to one of your ESX boxes. You then do

[code]
vcbRestore -h VC_HOST -u USERNAME -p PASSWORD -s DIRECTORY
[/code]

This will totally replace your existing VM, if you wanted a copy then you should copy the catalog file elsewhere, edit it to change the paths and

[code]
vcbRestore -h VC_HOST -u USERNAME -p PASSWORD -s DIRECTORY -a CATALOG
[/code]

There are a couple more features I haven’t mentioned which you can work out for yourself by using -h. eg File level backups for Windows VMs.

Now all of the above is great but VMware have taken things a step further. With the above if your VM is running VMware Tools the equivalent of a sync is done before the snapshot is taken which effectively gives you slightly better than a crash consistent dump. Though you could still lose some DB data.

So VMware have added some functionality to rectify this. Just before the snapshot is made /usr/sbin/pre-freeze-script or C:Windowspre-freeze-script.bat is run and /usr/sbin/post-thaw-script or C:Windowspost-thaw-script.bat are run afterwards. Taking a snapshot only takes a few minutes so you could use these scripts to stop your database for example.

I highly recommend reading the VMware Consolidated Backup manual for all the extra features I haven’t covered.