不同主机上的 Broadcom NetXreme NIC 之间出现数据包丢失

不同主机上的 Broadcom NetXreme NIC 之间出现数据包丢失

我遇到了一个有趣的问题,即同一网络中的多个服务器之间出现数据包丢失。这种情况发生在大约 15 台主机上,但我将在下面将其浓缩为仅 3 台。

首先是一些拓扑结构。所有机器上都相同。

hosta - 10.20.30.1; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2
hostb - 10.20.30.2; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2
hostc - 10.20.30.3; Debian Lenny 5.0.5 2.6.26-2-686 #1 SMP, firmware-bnx2 0.14+lenny2

lspci 给我……

Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

所有服务器都插入 Cisco 2900XL。此后,我将其更改为我们在现场使用的 TeloSystems 交换机,以确保它不是 Cisco。

服务器都是 IBM x3550 和 x3560 (M1/M2 之前)。

现在进行一些测试...我只会粘贴测试的一侧以节省空间,但如果我使用其他主机,结果是 100% 相同的。

root@hosta:~# ping -i 0.5 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 49542ms
rtt min/avg/max/mdev = 0.097/0.157/5.533/0.540 ms

root@hosta:~# ping -i 0.1 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 9941ms
rtt min/avg/max/mdev = 0.089/0.105/0.170/0.017 ms

root@hosta:~# ping -i 0.05 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 5167ms
rtt min/avg/max/mdev = 0.088/0.096/0.170/0.016 ms

root@hosta:~# ping -i 0.01 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 79 received, 21% packet loss, time 960ms
rtt min/avg/max/mdev = 0.088/0.095/0.126/0.009 ms

root@hosta:~# ping -i 0.025 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 2800ms
rtt min/avg/max/mdev = 0.087/0.097/0.120/0.006 ms

root@hosta:~# ping -i 0.02 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.085/0.096/0.164/0.017 ms

root@hosta:~# ping -i 0.019 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 99 received, 1% packet loss, time 1995ms
rtt min/avg/max/mdev = 0.085/0.092/0.112/0.014 ms

root@hosta:~# ping -i 0.015 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 92 received, 8% packet loss, time 1614ms
rtt min/avg/max/mdev = 0.086/0.099/0.161/0.016 ms


root@hosta:~# ping -i 0.0125 -c 100 10.20.30.2 -q
PING 10.20.30.2 (10.20.30.2) 56(84) bytes of data.
--- 10.20.30.2 ping statistics ---
100 packets transmitted, 84 received, 16% packet loss, time 1198ms
rtt min/avg/max/mdev = 0.083/0.093/0.136/0.012 ms

如果我将 MBP 连接到交换机(两者),则在运行上述测试时不会出现数据包丢失。

自从我们大约 9 个月前从 Etch 升级到 Lenny 以来,这种情况似乎才开始发生。

我的下一步是刻录一张 Ubuntu Live CD 来从不同的较新内核进行一些测试。

如能得到任何帮助/想法/指点我将非常感激。

答案1

以下是 Serverfaults 对此事的官方回答:http://blog.serverfault.com/post/broadcom-die-mutha/

相关内容