AT&T internet access cable service leaks packets

In the following I plan to first describe some weird state that the AT&T routers seem to sometimes fall into (this is what I call the AT&T broadband bug), then show a workaround that users can use, and finally explain why windows users are not affected by that bug, while linux users are. I have no idea why their routers behave that way and what this is supposed to accomplish, but at least I have a workaround so I don't care too much anymore - I still think this is pretty funny though, and I would be curious to understand the reason for that behaviour.

The following traceroute illustrates what happens when the problem occurs. I'm trying to ping some machine in France. As you see, the first router (which I understand to be the head end of the cable) responds to pings just fine. But everything else gets a lot of packet loss.

9:10 [walken:pts/0] Angel:~% /usr/sbin/traceroute -n zoy.org
traceroute to zoy.org (80.65.228.129), 30 hops max, 40 byte packets
 1  10.152.96.1  12 ms  10 ms  9 ms
 2  12.244.98.129  10 ms *  19 ms
 3  * 12.244.67.26  24 ms *
 4  12.244.72.194  17 ms *  8 ms
 5  * 12.123.13.162  9 ms *
 6  12.122.5.150  13 ms *  10 ms
 7  * 12.123.13.65  11 ms *
 8  * 192.205.32.110  9 ms *
 9  * 206.24.194.103  80 ms *
10  * * *
11  * * *
12  * 208.172.249.166  161 ms *
13  * * *
14  * * *
15  * 62.4.16.35  160 ms *
16  * * 62.4.16.146  155 ms
17  * * *
18  * 80.65.228.129  161 ms *

My gateway address is 12.236.24.1 - I think this is the same machine as 10.152.96.1, i.e. there are several IP subnets on my physical cable, and that machine is configured as a gateway for each of these subnets and it also has the 10.152.96.1 address as some kind of canonical name, if that makes any sense. Anyway, I can ping either 10.152.96.1 or 12.236.24.1 just fine:

9:13 [walken:pts/0] Angel:~% ping 10.152.96.1
PING 10.152.96.1 (10.152.96.1): 56 data bytes
64 bytes from 10.152.96.1: icmp_seq=0 ttl=255 time=27.1 ms
64 bytes from 10.152.96.1: icmp_seq=1 ttl=255 time=63.6 ms
64 bytes from 10.152.96.1: icmp_seq=2 ttl=255 time=14.5 ms
64 bytes from 10.152.96.1: icmp_seq=3 ttl=255 time=22.2 ms
64 bytes from 10.152.96.1: icmp_seq=4 ttl=255 time=16.7 ms
64 bytes from 10.152.96.1: icmp_seq=5 ttl=255 time=83.6 ms
64 bytes from 10.152.96.1: icmp_seq=6 ttl=255 time=71.6 ms
64 bytes from 10.152.96.1: icmp_seq=7 ttl=255 time=11.7 ms
64 bytes from 10.152.96.1: icmp_seq=8 ttl=255 time=45.0 ms
64 bytes from 10.152.96.1: icmp_seq=9 ttl=255 time=18.1 ms

However any hop after that (even just the next one) gets 50% of packet loss. Note there is a pattern there, one packet passes, the next is dropped, and then it repeats. This very characteristic of that problem, I get that all the time:

9:13 [walken:pts/0] Angel:~% ping 12.244.98.129
PING 12.244.98.129 (12.244.98.129): 56 data bytes
64 bytes from 12.244.98.129: icmp_seq=3 ttl=254 time=9.7 ms
64 bytes from 12.244.98.129: icmp_seq=5 ttl=254 time=8.6 ms
64 bytes from 12.244.98.129: icmp_seq=7 ttl=254 time=8.8 ms
64 bytes from 12.244.98.129: icmp_seq=9 ttl=254 time=11.0 ms
64 bytes from 12.244.98.129: icmp_seq=11 ttl=254 time=10.9 ms
64 bytes from 12.244.98.129: icmp_seq=13 ttl=254 time=9.5 ms
64 bytes from 12.244.98.129: icmp_seq=15 ttl=254 time=8.7 ms
64 bytes from 12.244.98.129: icmp_seq=17 ttl=254 time=10.0 ms
64 bytes from 12.244.98.129: icmp_seq=19 ttl=254 time=12.1 ms
64 bytes from 12.244.98.129: icmp_seq=21 ttl=254 time=47.3 ms
64 bytes from 12.244.98.129: icmp_seq=24 ttl=254 time=9.3 ms
64 bytes from 12.244.98.129: icmp_seq=26 ttl=254 time=9.1 ms
64 bytes from 12.244.98.129: icmp_seq=27 ttl=254 time=10.2 ms
64 bytes from 12.244.98.129: icmp_seq=29 ttl=254 time=8.7 ms
64 bytes from 12.244.98.129: icmp_seq=31 ttl=254 time=14.0 ms

Next I'm trying to determine where packets get dropped - is it on the way from my machine to the internet, or is it on the way back ? Well, experiments seem to show that this is on the way back. The following two traces indicate that:

In the first trace, I'm running 'ping zoy.org' on Angel, which is my local machine connected to the AT&T cable. The following tcpdump is run on zoy.org. As you see, zoy.org is receiving (and replying to) one packet per second. The ping command on Angel, though, reports 50% packet loss, with the same pattern as above when I pinged 12.244.98.129 (one reply passed, one was cut, etc)

pings from angel, tcpdump on zoy:
18:55:57.862831 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
18:55:57.862869 80.65.228.129 > 12.236.25.222: icmp: echo reply
18:55:58.838871 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
18:55:58.838902 80.65.228.129 > 12.236.25.222: icmp: echo reply
18:55:59.878452 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
18:55:59.878491 80.65.228.129 > 12.236.25.222: icmp: echo reply
18:56:00.836935 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
18:56:00.836974 80.65.228.129 > 12.236.25.222: icmp: echo reply
18:56:01.843552 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
18:56:01.843588 80.65.228.129 > 12.236.25.222: icmp: echo reply

In the second trace, I'm running 'ping angel' on zoy.org, and I run tcpdump on Angel. Here I see Angel replying to all the ICMP requests that get to it - but, that's only one every two seconds. The ping command on zoy.org shows 50% packet loss, with the usual one-pass one-cut pattern.

pings from zoy, tcpdump from my end, filtered out arp who-has noise:
09:46:49.042106 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:49.042273 12.236.25.222 > 80.65.228.129: icmp: echo reply
09:46:51.041764 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:51.041894 12.236.25.222 > 80.65.228.129: icmp: echo reply
09:46:53.041394 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:53.041524 12.236.25.222 > 80.65.228.129: icmp: echo reply
09:46:55.041398 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:55.041533 12.236.25.222 > 80.65.228.129: icmp: echo reply
09:46:57.043024 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:57.043148 12.236.25.222 > 80.65.228.129: icmp: echo reply
09:46:59.041521 80.65.228.129 > 12.236.25.222: icmp: echo request (DF)
09:46:59.041654 12.236.25.222 > 80.65.228.129: icmp: echo reply

So at this point I think we have established exactly what the problem is: sometimes the AT&T network starts to drop 50% of the packets coming to the internet addressed to my machine. This droppage seems to occur somewhere between 12.244.98.129 and 10.152.96.1 - the first one gets 50% of its packets addressed to me dropped, the second one gets all its packets thru. The other direction is still fine, i.e. my machine is still able to send packets to the internet and these packets do not get dropped.

I will add in case you have a doubt, that I *am* using the IP address assigned to me by the DHCP server, and that I have tried to release it and get a different IP address, and this behaviour seems to occur occasionally to me whatever IP address I use.

I have had this problem on and off for like 6 monthes, occasionally, maybe 3-4 times a week on average, sometimes more though. It usualy seems to fix itself after a few hours.

Anyway. I think I have just recently understood a bit more about the problem, and at least enough to figure out a workaround. Whenever the problem occurs, I can fix it by sending one ARP request to my gateway. It just works. I'm using the following command:

9:55 [root:pts/0] Angel:~# arping -c1 10.152.96.1 
ARPING 10.152.96.1
60 bytes from 00:0a:8b:6d:34:70 (10.152.96.1): index=0 time=8.703 msec

--- 10.152.96.1 statistics ---
1 packets transmitted, 1 packets received,   0% unanswered

To demonstrate how this is effective, I started a ping zoy.org again, in one of these times where I had the packet loss problem. As usual, I'm getting 50% packet loss. Then around the 130th ping, I run the arping command - I dont get any packet loss anymore after that.

9:23 [walken:pts/5] Angel:~% ping zoy.org
PING zoy.org (80.65.228.129): 56 data bytes
64 bytes from 80.65.228.129: icmp_seq=3 ttl=238 time=156.3 ms
......
64 bytes from 80.65.228.129: icmp_seq=113 ttl=238 time=212.5 ms
64 bytes from 80.65.228.129: icmp_seq=115 ttl=238 time=162.6 ms
64 bytes from 80.65.228.129: icmp_seq=117 ttl=238 time=191.2 ms
64 bytes from 80.65.228.129: icmp_seq=120 ttl=238 time=157.0 ms
64 bytes from 80.65.228.129: icmp_seq=121 ttl=238 time=156.7 ms
64 bytes from 80.65.228.129: icmp_seq=124 ttl=238 time=158.4 ms
64 bytes from 80.65.228.129: icmp_seq=126 ttl=238 time=158.4 ms
64 bytes from 80.65.228.129: icmp_seq=127 ttl=238 time=158.6 ms
64 bytes from 80.65.228.129: icmp_seq=129 ttl=238 time=157.6 ms
64 bytes from 80.65.228.129: icmp_seq=131 ttl=238 time=157.1 ms <- in another console, typed arping -c1 10.152.96.1
64 bytes from 80.65.228.129: icmp_seq=132 ttl=238 time=158.8 ms
64 bytes from 80.65.228.129: icmp_seq=133 ttl=238 time=155.6 ms
64 bytes from 80.65.228.129: icmp_seq=134 ttl=238 time=158.7 ms
64 bytes from 80.65.228.129: icmp_seq=135 ttl=238 time=159.2 ms
64 bytes from 80.65.228.129: icmp_seq=136 ttl=238 time=156.5 ms
64 bytes from 80.65.228.129: icmp_seq=137 ttl=238 time=161.1 ms
64 bytes from 80.65.228.129: icmp_seq=138 ttl=238 time=202.8 ms
64 bytes from 80.65.228.129: icmp_seq=139 ttl=238 time=156.5 ms

The following tcpdump, running on Angel, illustrates that too. Initially, only half of the echo replies are received. After the arp request, all echo replies start getting received:

tcpdump from angel:
09:55:41.757182 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:42.757212 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:42.914612 80.65.228.129 > 12.236.25.222: icmp: echo reply
09:55:43.757239 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:44.757263 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:44.914181 80.65.228.129 > 12.236.25.222: icmp: echo reply
09:55:45.642056 arp who-has 10.152.96.1 tell 12.236.25.222        <- my arp request
09:55:45.651484 arp reply 10.152.96.1 is-at 0:a:8b:6d:34:70
09:55:45.757296 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:45.910322 arp who-has 12.236.18.157 tell 12.236.16.1
09:55:45.915891 80.65.228.129 > 12.236.25.222: icmp: echo reply
09:55:46.583171 arp who-has 12.236.24.129 tell 12.236.24.1
09:55:46.757325 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:46.912632 80.65.228.129 > 12.236.25.222: icmp: echo reply
09:55:47.757351 12.236.25.222 > 80.65.228.129: icmp: echo request (DF)
09:55:47.915904 80.65.228.129 > 12.236.25.222: icmp: echo reply

So that's it - I now have a working workaround. every time the upstream router starts to drop the packets addressed to me, I can fix it just by sending it an ARP request. I still have no idea why the router starts dropping my packets, and why the ARP request fixes it, but heh, it works. You can easily guess that the arping command has made it into my crontab file.

So the next question might be, why isnt anyone else complaining. Well, first I have a few other friends who like me tried to complain to technical support - we're only told to reboot our computers and shove it since we don't use a supported operating system. So that's probably the first reason why. The second reason being that windows users are probably NOT affected, because of some subtle difference in their IP stack.

As I will demonstrate below, the IP stack in windows (I tested this with windows 2000 Pro, but I suppose other versions behave the same) unconditionally expires its ARP cache entries every 10 minutes - even if these entries are active. The following tcpdump illustrates that: I've installed windows 2000 Pro in a virtual machine (vmware) - for simplification we can assume that it behaves just like a real machine. Then in the windows 2000 pro box, I installed cygwin and typed 'ssh angel ping localhost'. That way there is an active tcp connection between angel and the win2k box - the ping command on angel displays one line per second, and that line is displayed on the win2k box. What I'm looking for here is the arp traffic between angel and the win2k box, to show that it does expire every 10 minutes:

win2k: ssh angel ping localhost
angel: tcpdump -n -i vmnet1|grep -v '\. ack'
21:26:34.486845 12.236.25.222.22 > 192.168.100.144.1029: P 20928:21024(96) ack 1 win 8576 (DF) [tos 0x8] 
21:26:35.486859 12.236.25.222.22 > 192.168.100.144.1029: P 21024:21120(96) ack 1 win 8576 (DF) [tos 0x8] 
21:26:35.598756 arp who-has 192.168.100.129 tell 192.168.100.144
21:26:35.598789 arp reply 192.168.100.129 is-at 0:50:56:c0:0:1
21:26:36.486899 12.236.25.222.22 > 192.168.100.144.1029: P 21120:21216(96) ack 1 win 8576 (DF) [tos 0x8] 
21:26:37.486939 12.236.25.222.22 > 192.168.100.144.1029: P 21216:21312(96) ack 1 win 8576 (DF) [tos 0x8] 
...
21:36:35.503573 12.236.25.222.22 > 192.168.100.144.1029: P 78624:78720(96) ack 49 win 8576 (DF) [tos 0x8] 
21:36:35.700914 arp who-has 192.168.100.129 tell 192.168.100.144
21:36:35.700954 arp reply 192.168.100.129 is-at 0:50:56:c0:0:1
21:36:36.503614 12.236.25.222.22 > 192.168.100.144.1029: P 78720:78816(96) ack 49 win 8576 (DF) [tos 0x8]

The ARP cache in linux has a different design: as explained in the arp(7) man page, an entry's validity will be extended if it receives positive feedback from higher level protocols. In practice, that means that if the above tcpdump had been with a linux box instead of the win2k box, we would not have seen the additional arp exchanges every 10 minutes, as the active TCP connection would have prevented the ARP cache entry from expiring - ever (well, at least as long as the TCP connection stays active). While for windows users, they are always sending at least one ARP request every 10 minutes, unless they don't send any IP packets at all.

This also gives us more insight into how the problem, when it occurs, seems to magically solve itself after a few hours. Presumably, the existing TCP connections, such as fetchmail or others, get slowed down to a crawl when they start to see 50% packet loss. But, they do not die completely - they just get very slow - and the remaining few packets that go thru are probably enough to maintain the 'positive feedback' in the ARP cache, and prevent the cache entries from expiring. After a long while, when all TCP connections finally die, the ARP cache entry expires, and at that point the next attempt to create a new TCP connection will result in sending an ARP request, which fixes the connection.

I think this theory makes sense, and matches pretty well with everything that I can observe about the network behaviour. I still dont have the slightest idea though, about why the upstream router occasionally starts dropping half of the packets addressed to Angel. It might be that it sees I'm not sending enough ARP requests, and that throws it into an anomalous state ???? That seems pretty weird to me though. I have no idea what this mechanism is supposed to achieve. It seems pretty clear to me though, that the upstream router is doing SOMETHING unusual with my machine's ARP requests.

Anyone knows ?