16.04 LTS 上的绑定问题

16.04 LTS 上的绑定问题

系统运行 Ubuntu 16.04.3 LTS,配备 3 个 1GBit NIC:一个嵌入式 NIC 和 2 个 Intel PCIe NIC。两个 Intel NIC 都以模式 4 (LACP) 绑定 (bond0)。交换机配置为在这两个端口上支持 LACP。以下是网络配置:

cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto enp0s31f6
#iface enp0s31f6 inet dhcp
iface enp0s31f6 inet static
    mtu 9000
    address 192.168.x.x
    netmask 255.255.x.0
    network 192.168.x.0
    gateway 192.168.x.1
    dns-nameservers 192.168.x.x

auto enp3s0
iface enp3s0 inet manual
bond-master bond0

auto enp4s0
iface enp4s0 inet manual
bond-master bond0

auto bond0
iface bond0 inet static
    mtu 9000
    address 192.168.x.x
    netmask 255.255.x.0
    network 192.168.x.0
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate 1
    bond-slaves none

此配置运行良好,没有错误。但如果网络负载较高(例如,在复制 100-200 GB 时),/var/log/syslog 中会产生以下错误:

Feb 14 17:20:02 ubuntu1 kernel: [29601.287684] e1000e: enp3s0 NIC Link is Down
Feb 14 17:20:02 ubuntu1 kernel: [29601.287993] e1000e 0000:03:00.0 enp3s0: speed changed to 0 for port enp3s0
Feb 14 17:20:02 ubuntu1 kernel: [29601.379193] bond0: link status definitely down for interface enp3s0, disabling it
Feb 14 17:20:02 ubuntu1 kernel: [29601.379199] bond0: first active interface up!
Feb 14 17:20:04 ubuntu1 kernel: [29603.064712] e1000e: enp3s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 14 17:20:04 ubuntu1 kernel: [29603.079162] bond0: link status definitely up for interface enp3s0, 1000 Mbps full duplex

这是一个已知问题吗?显然,几秒钟后,故障接口又恢复正常。这个问题并不经常发生。

在文件 /proc/net/bonding/bond0 中我可以看到模式 4 被正确识别:

cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535

我尝试使用带接口名称的 Bond-Slave,而不是 None。但在这种情况下,ifenslave 在重新启动网络服务时被锁定。因此,我发现一个建议,即使用“None”,bond0 将启动并且不会锁定。

有任何想法吗?

相关内容