So I was tearing my hair out today. I’d installed Ubuntu onto a new Sun X4200 so that I could migrate Bulletproof’s monitoring system to it. (Note you need to use edgy knot-1 for the SAS drives to be supported). Anyway as I was installing packages I was getting speeds like 10kB/s. Normally I would expect 800-1000kB/s.
I did the usual sort of debugging, where there any errors on the switch, was it affecting other servers on the same network etc etc. Everything looked fine. Our friend tcpdump showed a dump that looked something like this.
root@oldlace:~# tcpdump -ni bond0 port 80
tcpdump: listening on bond0
1.2.3.4.42501 > 203.16.234.85.80: S 0:0 win 5840 <mss 1460,sackOK,timestamp 94318 0,nop,wscale 6> (DF)
203.16.234.85.80 > 1.2.3.4.42501: S 0:0(0) ack 1 win 5840<mss 1460,nop,wscale 2> (DF)
1.2.3.4.42501 > 203.16.234.85.80: . ack 1 win 92 (DF)
1.2.3.4.42501 > 203.16.234.85.80: P 1:352(351) ack 1 win 92 (DF)
203.16.234.85.80 > 1.2.3.4.42501: . ack 352 win 1608 (DF)
You’ll notice that the server initially advertises a window size of 5840, then suddenly in the first ACK it is advertising a size of 92. This means that the other side can only send 92 bytes before waiting for an ACK!!! Not very conducive to quick WAN transfer speeds.
After a lot of Google searching I discovered these threads on LKLM
- http://www.gatago.com/linux/kernel/9440712.html
- http://lwn.net/Articles/92727/
- http://oss.sgi.com/archives/netdev/2004-07/msg00142.html
Of course what I was missing was the wscale 6, which means that the windows was actually 92*2^6 = 5888. Which is pretty close to 5840 so why bother with the scaling, because towards the end of the connection we get 16022*2^6 = 1025408 which doesn’t normally fit into a TCP header.
So why aren’t things screaming along with this massive window, well something in the middle doesn’t like a windows scaling factor of 6 and is resetting it to zero. Which means the other end thingk the windows size really is 92.
There are 2 quick fixes. First you can simply turn off windows scaling all together by doing
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
but that limits your window to 64k. Or you can limit the size of your TCP buffers back to pre 2.6.17 kernel values which means a wscale value of about 2 is used which is acceptable to most broken routers.
echo "4096 16384 131072" > /proc/sys/net/ipv4/tcp_wmem
echo "4096 87380 174760" > /proc/sys/net/ipv4/tcp_rmem
The original values would have had 4MB in the last column above which is what was allowing these massive windows.
In a thread somewhere which I can’t find anymore Dave Miller had a great quote along the lines of
“I refuse to workaround it, window scaling has been part of the protocol since 1999, deal with it.”
September 6th, 2006 at 11:18 pm
Your refrence links are broken.
September 7th, 2006 at 8:16 am
Thanks, fixed now.
September 28th, 2006 at 2:11 pm
The argument that they shouldn’t workaround the window scaling problem because there shouldn’t be broken boxes out there in the first place is just like saying that we shouldn’t have jails becasue people shouldn’t be doing illegal things in the first place. Yes, in an ideal world that’s correct. But we don’t live in an ideal world. People do illegal things, so we need jails. And there are broken boxes out there, so we need workarounds. The day no one does any illegal thing will be the day to get rid of jails, not before. And the day there are no broken boxes out there, we can use this new window scaling value. It’s simple. Now when most big distros move to kernel 2.6.17 (just in the next month or 2) we’ll have thousands of people complaining that their connection doesn’t work. And nobody will know or care about window scaling. They’ll just want their computer to work as it did before.
Oops, sorry for the rant. It’s not aimed at you, of course. I just thought I would post my thoughts about this issue here in case Linux Torvalds reads this blog
November 23rd, 2006 at 3:53 pm
Another way…
vi /etc/sysctl.conf
add the following two lines
net.ipv4.tcp_wmem = 4096 16384 131072
net.ipv4.tcp_rmem = 4096 87380 174760
sysctl -p
March 5th, 2007 at 7:20 pm
[...] Googling found nothing, so I had to grin and bear it until the sysadmin found a fix. After a while, he came across this solution (from here). echo 0 > /proc/sys/net/ipv4/tcp_window_scaling [...]
March 17th, 2007 at 11:58 am
[...] http://inodes.org/blog/2006/09/06/tcp-window-scaling-and-kernel-2617/ [...]
June 29th, 2007 at 4:58 pm
[...] explanation / workaround Recently I found out what is going on with this bug. The problem has to do with larger default buffer sizes for TCP window scaling in recent Linux kernel versions. If there is an improperly configured router between you and the server you’re trying to reach, it will create a bottleneck and you’ll get no data from the server. The long and the short of it is that it is not a Linux bug. Here are some better, more detailed explanations: John’s Tidbits » Blog Archive » TCP Window Scaling and kernel 2.6.17+ Linux: Window Scaling on the Internet | KernelTrap The solution I have used (and I’ve had to do it again whenever I’ve updated my kernel) is to change the values in /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem. The new values should be "4096 16384 131072" and "4096 87380 174760" respectively. This sets the window scaling buffer back to its earlier/smaller size. [...]
August 13th, 2007 at 2:52 pm
How to turn off windows scaling on windows xp pc
January 31st, 2008 at 6:47 am
Totally solved my issue with RHEL5. Big thanks.
February 1st, 2008 at 5:47 am
[...] First read the following articles: [...]
March 26th, 2008 at 5:18 am
[...] Due to an issue that the 2.6.17+ kernel exemplified, all of the sites I host (annvix.org, linsec.ca, danen.ca, etc.) have been unavailable to anyone using a 2.6.17+ kernel due to a change in the way TCP window scaling was implemented, which has been noted in a number of places (here’s one: TCP Window Scaling and kernel 2.6.17+). [...]