我有一台Suse12系统,配备Intel 82599ES网卡(带2个10千兆SFI/SFP+端口),两个端口通过lacp绑定。最近,系统网络不可达,持续了3分钟。
查看消息日志,我注意到当接口关闭时,我们收到以下信息:
2019-03-03T09:23:10.491731+08:00 oradb12 kernel: [9519285.192448] ixgbe 0000:02:00.1 eth5: initiating reset due to tx timeout
2019-03-03T09:23:10.491754+08:00 oradb12 kernel: [9519285.192464] ixgbe 0000:02:00.1 eth5: Reset adapter
2019-03-03T09:23:16.995739+08:00 oradb12 kernel: [9519291.696952] ixgbe 0000:02:00.1 eth5: speed changed to 0 for port eth5
2019-03-03T09:23:16.995763+08:00 oradb12 kernel: [9519291.697438] bond1: link status definitely down for interface eth5, disabling it
系统内核版本如下:
Linux oradb12 4.4.74-92.35-default #1 SMP Mon Aug 7 18:24:48 UTC 2017 (c0fdc47) x86_64 x86_64 x86_64 GNU/Linux
oradb12:/etc/sysconfig/network # cat /etc/SuSE-release
SUSE Linux Enterprise Server 12 (x86_64)
VERSION = 12
PATCHLEVEL = 2
Bonding组网接口如下:
oradb12:/etc/sysconfig/network # cat ifcfg-bond1
BOOTPROTO='static'
STARTMODE='onboot'
BONDING_MASTER='yes'
BONDING_SLAVE0='eth3'
BONDING_SLAVE1='eth5'
IPADDR=10.252.128.2
GATEWAY=10.252.128.1
NETMASK=255.255.255.0
USERCONTROL='no'
BONDING_MODULE_OPTS='mode=4 miimon=100 use_carrier=1'
oradb12:/etc/sysconfig/network # cat ifcfg-eth3
NAME='bond1-slave-eth3'
TYPE='Ethernet'
BOOTPROTO='none'
STARTMODE='onboot'
MASTER='bond1'
SLAVE='yes'
USERCONTROL='no'
oradb12:/etc/sysconfig/network # cat ifcfg-eth5
NAME='bond1-slave-eth5'
TYPE='Ethernet'
BOOTPROTO='none'
STARTMODE='onboot'
MASTER='bond1'
SLAVE='yes'
USERCONTROL='no'
Bonding网口状态如下:
oradb12:/etc/sysconfig/network # cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 48:fd:8e:c9:21:64
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 10273
Partner Mac Address: 74:4a:a4:08:ea:14
Slave Interface: eth3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 48:fd:8e:c9:21:64
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 48:fd:8e:c9:21:64
port key: 13
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32768
system mac address: 74:4a:a4:08:ea:14
oper key: 10273
port priority: 32768
port number: 33
port state: 61
Slave Interface: eth5
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 24
Permanent HW addr: 48:fd:8e:c9:21:65
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 48:fd:8e:c9:21:64
port key: 13
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32768
system mac address: 74:4a:a4:08:ea:14
oper key: 10273
port priority: 32768
port number: 87
port state: 61
网络接口驱动信息如下:
oradb12:/etc/sysconfig/network # ethtool -i eth3
driver: ixgbe
version: 4.2.1-k
firmware-version: 0x800003df
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
oradb12:/etc/sysconfig/network # ethtool -i eth5
driver: ixgbe
version: 4.2.1-k
firmware-version: 0x800003df
expansion-rom-version:
bus-info: 0000:02:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
当网络接口出现故障时,通过运行服务网络重启来重新启动服务器上的网络服务,似乎可以解决问题
我想知道是否有人以前遇到过类似的问题,或者对调试此类问题的原因有什么建议?