系统运行 Ubuntu 16.04.3 LTS,配备 3 个 1GBit NIC:一个嵌入式 NIC 和 2 个 Intel PCIe NIC。两个 Intel NIC 都以模式 4 (LACP) 绑定 (bond0)。交换机配置为在这两个端口上支持 LACP。以下是网络配置:
cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
source /etc/network/interfaces.d/*
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto enp0s31f6
#iface enp0s31f6 inet dhcp
iface enp0s31f6 inet static
mtu 9000
address 192.168.x.x
netmask 255.255.x.0
network 192.168.x.0
gateway 192.168.x.1
dns-nameservers 192.168.x.x
auto enp3s0
iface enp3s0 inet manual
bond-master bond0
auto enp4s0
iface enp4s0 inet manual
bond-master bond0
auto bond0
iface bond0 inet static
mtu 9000
address 192.168.x.x
netmask 255.255.x.0
network 192.168.x.0
bond-mode 802.3ad
bond-miimon 100
bond-lacp-rate 1
bond-slaves none
此配置运行良好,没有错误。但如果网络负载较高(例如,在复制 100-200 GB 时),/var/log/syslog 中会产生以下错误:
Feb 14 17:20:02 ubuntu1 kernel: [29601.287684] e1000e: enp3s0 NIC Link is Down
Feb 14 17:20:02 ubuntu1 kernel: [29601.287993] e1000e 0000:03:00.0 enp3s0: speed changed to 0 for port enp3s0
Feb 14 17:20:02 ubuntu1 kernel: [29601.379193] bond0: link status definitely down for interface enp3s0, disabling it
Feb 14 17:20:02 ubuntu1 kernel: [29601.379199] bond0: first active interface up!
Feb 14 17:20:04 ubuntu1 kernel: [29603.064712] e1000e: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 14 17:20:04 ubuntu1 kernel: [29603.079162] bond0: link status definitely up for interface enp3s0, 1000 Mbps full duplex
这是一个已知问题吗?显然,几秒钟后,故障接口又恢复正常。这个问题并不经常发生。
在文件 /proc/net/bonding/bond0 中我可以看到模式 4 被正确识别:
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
我尝试使用带接口名称的 Bond-Slave,而不是 None。但在这种情况下,ifenslave 在重新启动网络服务时被锁定。因此,我发现一个建议,即使用“None”,bond0 将启动并且不会锁定。
有任何想法吗?