Bug#666386: igb + bnx2 + ifenslave + brctl + vconfig = largely broken

March 30th, 2012 - 06:20 am ET by Josip Rodin | Report spam
Package: linux-image-2.6.32-5-xen-amd64
Version: 2.6.32-41

Hi,

The machine is a new IBM x3550 M3, with this network hardware:

% lspci | grep Ethernet
0b:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
0b:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
1a:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)
1a:00.1 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01)

One of each brands (eth0 and eth2) has a working cable plugged into a
working Ethernet switch that's set up so that it serves a native VLAN
(otherwise known as ID 54) and VLAN ID 2 trunked (tagged), among others.

The devices are:

lrwxrwxrwx 1 root root 0 Mar 19 15:42 /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:07.0/0000:1a:00.0/net/eth0/
lrwxrwxrwx 1 root root 0 Mar 19 15:42 /sys/class/net/eth2 -> ../../devices/pci0000:00/0000:00:01.0/0000:0b:00.0/net/eth2/

So, if I read that right, eth0 is Intel, and eth2 is Broadcom.

The desired network setup is, in interfaces(5) format:

iface bond54 inet manual
slaves eth0 eth2
bond_mode active-backup
bond_miimon 100

iface xenbr54 inet static
bridge-ports bond54
bridge-fd 0
address 192.168.54.2
netmask 255.255.255.0

iface vlan2 inet manual
vlan-raw-device xenbr54

iface xenbr2 inet static
bridge-ports vlan2
bridge-fd 0
address 213.202.97.156
netmask 255.255.255.240
gateway 213.202.97.145

This used to work for me elsewhere, however, on this machine it's broken as
follows:

Everything starts up fine, and the machine is perfectly usable (albeit I
only used SSH) over the xenbr54 interface.

However, over the xenbr2 interface, all the small network packets pass, such
as ICMP, or the bringup and teardown of HTTP connections, but as soon as I
try to actually GET something non-trivial over a seemingly established HTTP
connection, the machine pretends it doesn't see that incoming traffic.

Like this:

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response...

In parallel, the trace shows:

% sudo tshark -n -i xenbr2
0.000000 213.202.97.156 -> 161.53.160.11 TCP 51657 > 80 [SYN] Seq=0 WinX40 Len=0 MSS60 TSV#2632046 TSER=0 WS=1
0.001797 161.53.160.11 -> 213.202.97.156 TCP 80 > 51657 [SYN, ACK] Seq=0 Ack=1 WinW92 Len=0 MSS60 TSVd3552423 TSER#2632046 WS=8
0.001816 213.202.97.156 -> 161.53.160.11 TCP 51657 > 80 [ACK] Seq=1 Ack=1 WinX40 Len=0 TSV#2632046 TSERd3552423
0.001906 213.202.97.156 -> 161.53.160.11 HTTP GET /debian/ls-lR.gz HTTP/1.0
0.003625 161.53.160.11 -> 213.202.97.156 TCP 80 > 51657 [ACK] Seq=1 Ack1 Wini12 Len=0 TSVd3552423 TSER#2632046

And then it sits there. The server machine (which I happen to have control
over) says:

0.000000 213.202.97.156 -> 161.53.160.11 TCP 51660 > 80 [SYN] Seq=0 WinX40 Len=0 MSS60 TSV#2668023 TSER=0 WS=1
0.000023 161.53.160.11 -> 213.202.97.156 TCP 80 > 51660 [SYN, ACK] Seq=0 Ack=1 WinW92 Len=0 MSS60 TSVd3588400 TSER#2668023 WS=8
0.003117 213.202.97.156 -> 161.53.160.11 TCP 51660 > 80 [ACK] Seq=1 Ack=1 WinX40 Len=0 TSV#2668024 TSERd3588400
0.003125 213.202.97.156 -> 161.53.160.11 HTTP GET /debian/ls-lR.gz HTTP/1.0
0.003145 161.53.160.11 -> 213.202.97.156 TCP 80 > 51660 [ACK] Seq=1 Ack1 Wini12 Len=0 TSVd3588401 TSER#2668024
0.003480 161.53.160.11 -> 213.202.97.156 TCP [TCP segment of a reassembled PDU]
0.003500 161.53.160.11 -> 213.202.97.156 TCP [TCP segment of a reassembled PDU]
0.204965 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
0.613959 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
1.428964 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
3.061959 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
6.329958 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
12.853960 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]

And then I Ctrl+C that wget, and the traces show:

(on the client)
8.017451 213.202.97.156 -> 161.53.160.11 TCP 51664 > 80 [FIN, ACK] Seq1 Ack=1 WinX40 Len=0 TSV#2696067 TSERd3614440
8.057740 161.53.160.11 -> 213.202.97.156 TCP [TCP Previous segment lost] 80 > 51664 [ACK] SeqC45 Ack2 Wini12 Len=0 TSVd3616454 TSER#2696067

(on the server)
8.017218 213.202.97.156 -> 161.53.160.11 TCP 51664 > 80 [FIN, ACK] Seq1 Ack=1 WinX40 Len=0 TSV#2696067 TSERd3614440
8.055647 161.53.160.11 -> 213.202.97.156 TCP 80 > 51664 [ACK] SeqC45 Ack2 Wini12 Len=0 TSVd3616454 TSER#2696067
10.778888 161.53.160.11 -> 213.202.97.156 TCP [TCP segment of a reassembled PDU]
12.850888 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]
25.906890 161.53.160.11 -> 213.202.97.156 TCP [TCP Retransmission] [TCP segment of a reassembled PDU]

That server isn't broken. The same thing happens when I initiate an SSH
connection to a random other machine - it works as far as getting a shell,
but if I run mutt -y that increases the amount of data going through, it
dies just as well.

I also have another couple of much older machines plugged into the same
switch at the client side, using native VLAN 2, and it's working just fine.

Then I thought, maybe it's this switch that doesn't do VLANs properly, and
it's killing my traffic.

So I started disassembling this complex setup on the machine, and got this:

* I tried to remove xenbr2 and move the L3 setup onto vlan2 - it worked,
but had the same failure symptoms as above

* I tried to remove xenbr54 and move the VLAN setup onto bond54 - and that
made everything work just fine.

So it looks like the bridging component is the trigger for screwing things
up. But since I simply can't lose the bridging because of Xen, I went
further and tried to fiddle with things a bit more:

% sudo ifenslave -d bond54 eth0

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... [hangs] ^C

Now it can't even connect. Let's put it back in:

% sudo ifenslave bond54 eth0

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6533601 (6.2M) [application/x-gzip]
[...all good...]

So a random enslaving back-and-forth makes it work?

Let's see if there's any difference in hardware:

% sudo ifenslave -d bond54 eth2

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... [hangs] ^C

% sudo ifenslave bond54 eth2

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... [hangs] ^C

There it's consistent at least.

Let's try the -c option while both are enslaved:

% sudo ifenslave -c bond54 eth0

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... ^C

% sudo ifenslave -c bond54 eth2

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6533601 (6.2M) [application/x-gzip]
[...all good...]

% sudo ifenslave -c bond54 eth0

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... ^C

% sudo ifenslave -c bond54 eth2

% wget -O /dev/null http://ftp.hr.debian.org/debian/ls-lR.gz
Resolving ftp.hr.debian.org... 161.53.160.11, 2001:b68:ff:1::11
Connecting to ftp.hr.debian.org|161.53.160.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6533601 (6.2M) [application/x-gzip]
[...all good...]

So I'm guessing there's something wrong with one of these device drivers,
or the enslaving infrastructure, or both.

Since all combinations with active Intel drivers are broken,
and the only working combinations are ones where Intel is disabled,
it looks like the igb driver could be the one doing something wrong,
given these circumstances.

For the record, I also tried the same final test after having done:
for i in filter nat mangle; do sudo iptables -t $i -F; done
just to make sure nothing fishy was going on there - the results were
the same, netfilter isn't interfering.

Please fix this. TIA.

2. That which causes joy or happiness.



To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
email Follow the discussionReplies 6 repliesReplies Make a reply

Replies

#1 Ben Hutchings
March 31st, 2012 - 10:20 pm ET | Report spam

I bet this is due to the combination of LRO plus bridging. We try to
turn off LRO in devices under a bridge, but that won't work if there's
an intermediate bonding device.

If you run:

# ethtool -K eth0 lro off
# ethtool -K eth2 lro off

does the bridge start working?

Ben.

Ben Hutchings
I'm always amazed by the number of people who take up solipsism because
they heard someone else explain it. - E*Borg on alt.fan.pratchett






To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact

Similar topics