Elastix and VMware

Took the plunge today to update my asterisk server. I’ve been using asterisk for about 5 years now and am pretty adept and manipulating its cryptic configuration files but I wanted to move to more of an appliance. I decided to give Elastix a try.

These days I virtualise all my boxes on a VMware Server environment. I got Elastix installed with no problems but then I wanted to get VMware Tools installed. This gives you better network drivers and make sure your clock stays in sync.

Since this requires you to compile some kernel modules you need to have the kernel-devel package installed so you can compile against your current kernel. This would normally be a simple matter of

yum install kernel-devel

However this seemed to do nothing. After a fair bit of investigation I worked out that Elastix ship there own kernel and modules for some asterisk specific hardware like zaptel and rhino. To make sure you don’t use the CentOS kernel they disable that package from that repository.

If you don’t particularly need the Elastix kernel (I don’t since this system will be pure VoIP) you can renable the CentOS modules by editing /etc/yum.repos.d/CentOS-Base.repo and commenting out ball the lines that look like

exclude=kernel*

Update: So it seems that this means that I won’t get the ztdummy module. This module uses the USB chipset to provide timing for some asterisk related things like the multi user conference module. I don’t really use this at the moment it’s not a big deal but I may have to roll my own kernel RPMs later down the track.

TCP Window Scaling and kernel 2.6.17+

So I was tearing my hair out today. I’d installed Ubuntu onto a new Sun X4200 so that I could migrate Bulletproof’s monitoring system to it. (Note you need to use edgy knot-1 for the SAS drives to be supported). Anyway as I was installing packages I was getting speeds like 10kB/s. Normally I would expect 800-1000kB/s.

I did the usual sort of debugging, where there any errors on the switch, was it affecting other servers on the same network etc etc. Everything looked fine. Our friend tcpdump showed a dump that looked something like this.


root@oldlace:~# tcpdump -ni bond0 port 80
tcpdump: listening on bond0
1.2.3.4.42501 > 203.16.234.85.80: S 0:0 win 5840 <mss 1460,sackOK,timestamp 94318 0,nop,wscale 6> (DF)
203.16.234.85.80 > 1.2.3.4.42501: S 0:0(0) ack 1 win 5840<mss 1460,nop,wscale 2> (DF)
1.2.3.4.42501 > 203.16.234.85.80: . ack 1 win 92 (DF)
1.2.3.4.42501 > 203.16.234.85.80: P 1:352(351) ack 1 win 92 (DF)
203.16.234.85.80 > 1.2.3.4.42501: . ack 352 win 1608 (DF)

You’ll notice that the server initially advertises a window size of 5840, then suddenly in the first ACK it is advertising a size of 92. This means that the other side can only send 92 bytes before waiting for an ACK!!! Not very conducive to quick WAN transfer speeds.

After a lot of Google searching I discovered these threads on LKLM

Of course what I was missing was the wscale 6, which means that the windows was actually 92*2^6 = 5888. Which is pretty close to 5840 so why bother with the scaling, because towards the end of the connection we get 16022*2^6 = 1025408 which doesn’t normally fit into a TCP header.

So why aren’t things screaming along with this massive window, well something in the middle doesn’t like a windows scaling factor of 6 and is resetting it to zero. Which means the other end thingk the windows size really is 92.

There are 2 quick fixes. First you can simply turn off windows scaling all together by doing

echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

but that limits your window to 64k. Or you can limit the size of your TCP buffers back to pre 2.6.17 kernel values which means a wscale value of about 2 is used which is acceptable to most broken routers.

echo "4096 16384 131072" > /proc/sys/net/ipv4/tcp_wmem
echo "4096 87380 174760" > /proc/sys/net/ipv4/tcp_rmem

The original values would have had 4MB in the last column above which is what was allowing these massive windows.

In a thread somewhere which I can’t find anymore Dave Miller had a great quote along the lines of

“I refuse to workaround it, window scaling has been part of the protocol since 1999, deal with it.”