我正在尝试使用绑定(特别是 802.3ad 模式)将具有两个 1Gbps NIC 的 Linux 服务器与交换机 Netgear ProSafe GSM7248V2 连接。结果非常混乱,我将非常感谢任何关于下一步尝试的提示。
在服务器端,这是我的 /etc/network/interfaces:
auto bond0
iface bond0 inet static
address 192.168.1.15/24
gateway 192.168.1.254
dns-nameservers 8.8.8.8
dns-search my-domain.org
bond-slaves eno1 eno2
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
hwaddress aa:bb:cc:dd:ee:ff
交换机的配置如下:
(GSM7248V2) #show port-channel 3/2
Local Interface................................ 3/2
Channel Name................................... fubarlg
Link State..................................... Up
Admin Mode..................................... Enabled
Type........................................... Dynamic
Load Balance Option............................ 6
(Src/Dest IP and TCP/UDP Port fields)
Mbr Device/ Port Port
Ports Timeout Speed Active
------ ------------- --------- -------
0/7 actor/long Auto True
partner/long
0/8 actor/long Auto True
partner/long
(GSM7248V2) #show lacp actor 0/7
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/7 1 55 128 ACT|AGG|LTO
(GSM7248V2) #show lacp actor 0/8
Sys Admin Port Admin
Intf Priority Key Priority State
------ -------- ----- -------- -----------
0/8 1 55 128 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/7
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/7 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
(GSM7248V2) #show lacp partner 0/8
Sys System Admin Prt Prt Admin
Intf Pri ID Key Pri Id State
------ --- ----------------- ----- --- ----- -----------
0/8 0 00:00:00:00:00:00 0 0 0 ACT|AGG|LTO
我认为 xmit“layer3+4”与交换机的负载平衡类型 6 最兼容。第一个令人惊讶的事情是交换机看不到 LACP 伙伴的 MAC 地址。
在服务器端,这是/proc/net/bonding/bond0的内容:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: ac:1f:6b:dc:2e:88
Active Aggregator Info:
Aggregator ID: 15
Number of ports: 2
Actor Key: 9
Partner Key: 55
Partner Mac Address: a0:21:b7:9d:83:6a
Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:88
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ac:1f:6b:dc:2e:88
port key: 9
port priority: 255
port number: 1
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: a0:21:b7:9d:83:6a
oper key: 55
port priority: 128
port number: 8
port state: 61
Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:dc:2e:89
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: ac:1f:6b:dc:2e:88
port key: 9
port priority: 255
port number: 2
port state: 63
details partner lacp pdu:
system priority: 1
system mac address: a0:21:b7:9d:83:6a
oper key: 55
port priority: 128
port number: 7
port state: 61
如果我理解正确的话,这意味着 Linux 绑定驱动程序正确确定了所有聚合器详细信息(密钥、端口号、系统优先级、端口优先级等)。尽管如此,我在重新启动网络服务后在 dmesg 中收到了此信息:
[Dec14 20:40] bond0: Releasing backup interface eno1
[ +0.000004] bond0: first active interface up!
[ +0.090621] bond0: Removing an active aggregator
[ +0.000004] bond0: Releasing backup interface eno2
[ +0.118446] bond0: Enslaving eno1 as a backup interface with a down link
[ +0.027888] bond0: Enslaving eno2 as a backup interface with a down link
[ +0.008805] IPv6: ADDRCONF(NETDEV_UP): bond0: link is not ready
[ +3.546823] igb 0000:04:00.0 eno1: igb: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ +0.160003] igb 0000:05:00.0 eno2: igb: eno2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ +0.035608] bond0: link status definitely up for interface eno1, 1000 Mbps full duplex
[ +0.000004] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[ +0.000008] bond0: first active interface up!
[ +0.000166] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ +0.103821] bond0: link status definitely up for interface eno2, 1000 Mbps full duplex
两个接口都处于活动状态,网络连接似乎很正常,我只是收到奇怪的警告,说没有 802.3ad 兼容伙伴。
此外,当我尝试同时从连接到同一台交换机的两台不同机器复制两个大型二进制文件(每个文件 10GB)时,每台机器都连接 1Gbps,服务器上 bond0 接口的总吞吐量远低于 1Gbps,尽管我预计吞吐量接近 2 Gbps(读取速度等不是这里的限制因素,所有 SSD 都缓存良好,等等)。当我从同一台机器依次复制相同的文件时,我很容易达到接近 1Gbps 的吞吐量。
请问您知道这里可能出了什么问题吗?关于诊断,dmesg(没有兼容 802.3ad 的合作伙伴)和交换机的 sh lacp 输出(没有合作伙伴的 MAC,尽管常规端口记录显示了连接的 NIC 的正确 MAC 地址)中出现了令人困惑的警告。关于网络性能,我看不到使用两个不同连接的任何聚合。我将非常感谢任何提示。
答案1
交换机配置为long
LACP 超时 - 每 30 秒一个 LACPDU。
Linux系统配置为bond-lacp-rate 1
。
我无法找到这在 Debian 中实际上的作用,但如果它将lacp_rate=1
模块选项传递给绑定(参考),那么这就是快速超时——每 1 秒一个 LACPDU。
慢速/快速 LACP 速率之间的不匹配是一种错误配置。
我能找到的所有示例文档都表明 Debian 接受bond-lacp-rate slow
它,希望能为您纠正它。
您可能还需要bond-lacp-rate
从配置文件中删除该行,因为默认速率较慢,然后卸载绑定模块或重新启动以应用。
不要只用两个流来测试吞吐量。该layer3+4
策略不保证任何两个流都获得单独的 NIC,只是保证在有足够的流的情况下,流量应该会比较均衡。
使用 16 或 32 个并发 iperf3 TCP 流进行测试。所有流的总吞吐量应接近 2Gbps。