drivers/net/ethernet/intel/e1000e/netdev.c:3804 处的内核 BUG!

drivers/net/ethernet/intel/e1000e/netdev.c:3804 处的内核 BUG!

当高峰时段流量较高时,我们的服务器几乎每天都开始崩溃,系统日志总是通过几次 eth0 重置而发送垃圾邮件,然后网络完全崩溃,必须重新启动计算机才能获得对计算机的远程访问再次。

此错误是否意味着 NIC 卡已损坏或只是软件问题?

运行内核:4.19.0-10-amd64 操作系统:Debian 10

Jan 25 18:00:41 Debian-83-jessie-64-minimal kernel: [161879.702795] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:00:45 Debian-83-jessie-64-minimal kernel: [161883.545928] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:04:41 Debian-83-jessie-64-minimal kernel: [162119.835193] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:04:45 Debian-83-jessie-64-minimal kernel: [162123.214074] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:05:50 Debian-83-jessie-64-minimal kernel: [162188.695254] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:05:54 Debian-83-jessie-64-minimal kernel: [162192.610229] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:06:14 Debian-83-jessie-64-minimal kernel: [162212.759251] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:06:18 Debian-83-jessie-64-minimal kernel: [162216.990139] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:27 Debian-83-jessie-64-minimal kernel: [162285.975361] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:07:31 Debian-83-jessie-64-minimal kernel: [162289.814340] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:47 Debian-83-jessie-64-minimal kernel: [162305.687558] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:07:51 Debian-83-jessie-64-minimal kernel: [162309.506389] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:07:59 Debian-83-jessie-64-minimal systemd[1]: session-247.scope: Succeeded.
    Jan 25 18:08:48 Debian-83-jessie-64-minimal kernel: [162366.871583] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:08:52 Debian-83-jessie-64-minimal kernel: [162370.734613] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27975]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
    Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27974]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Starting Clean php session files...
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: phpsessionclean.service: Succeeded.
    Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Started Clean php session files.
    Jan 25 18:09:42 Debian-83-jessie-64-minimal kernel: [162420.891568] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:09:46 Debian-83-jessie-64-minimal kernel: [162424.734698] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:10:57 Debian-83-jessie-64-minimal kernel: [162495.895693] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:11:01 Debian-83-jessie-64-minimal kernel: [162499.750608] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.895786] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915877] ------------[ cut here ]------------
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915964] kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916486] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916615] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916689] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916780] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916872] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916963] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917055] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917147] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917315] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917497] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917588] Call Trace:
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917663]  e1000e_reset+0x574/0x790 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917743]  e1000e_down+0x1cf/0x200 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917819]  e1000e_reinit_locked+0x46/0x60 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917898]  process_one_work+0x1a7/0x3a0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917974]  worker_thread+0x30/0x390
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918046]  ? create_worker+0x1a0/0x1a0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918118]  kthread+0x112/0x130
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918188]  ? kthread_bind+0x30/0x30
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918260]  ret_from_fork+0x35/0x40
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918331] Modules linked in: unix_diag ip6t_rpfilter ipt_rpfilter binfmt_misc veth ip6t_MASQUERADE ipt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp bridge stp llc dm_mod ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat nf_nat_ipv6 ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables nfnetlink cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul evdev crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore squashfs iTCO_wdt pcc_cpufreq sg iTCO_vendor_support intel_pch_thermal intel_rapl_perf fujitsu_laptop wmi loop sparse_keymap video acpi_pad button ip_tables x_tables autofs4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918698]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 md_mod sd_mod crc32c_intel ahci xhci_pci libahci xhci_hcd libata aesni_intel e1000e usbcore scsi_mod aes_x86_64 crypto_simd cryptd glue_helper i2c_i801 usb_common thermal fan
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918920] ---[ end trace fc8f12793b39335d ]---
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918998] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919078] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919206] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919281] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919372] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919464] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919555] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919647] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919739] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919937] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920123] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

答案1

kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

您正在运行 Debian 的发行版内核,该内核在上游源代码之上应用了一些补丁,因此我的快速分析可能并不完全准确。但看着drivers/net/ethernet/intel/e1000e/netdev.c4.19.170 上游源代码的第 3804 行把我们带到这一行:

BUG_ON(tdt != tx_ring->next_to_use);

kernel BUG at...如果指定的条件为真,这将触发带有堆栈跟踪和所有内容的消息。

该行位于 function 中e1000_flush_tx_ring(),由 function 调用,e1000_flush_desc_rings()而 function 又在错误消息中被称为指令指针位置:

RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

也许编译器已经内联或以其他方式优化了该e1000_flush_tx_ring()函数,以便它不明显作为该RIP:行的可识别符号。但它似乎匹配:调用跟踪强烈表明驱动程序正在重置 NIC,并且刷新 TX 环显然是该过程的一部分。

但什么使重置成为必要呢?事实证明Intel 发布了 I218/I219 NIC 的规格更新

5.I219 处理 DMA 事务时缓冲区溢出

问题:英特尔® 100/200 系列芯片组平台减少了 LAN 控制器 DMA 访问的往返延迟,在某些高性能情况下,导致 I219 LAN 连接设备处理 DMA 事务时出现缓冲区溢出。

含义:在压力非常大的 UDP 流量和多次重新连接以太网电缆的情况下,I219LM 和 I219V 设备可能会陷入无法恢复的 Tx 挂起状态。仅当系统重新启动时,LAN 控制器的 Tx 挂起才会恢复。

解决方法:通过减少未完成的请求数量来稍微减慢 DMA 访问速度。此解决方法可能会对 TCP 流量性能产生影响,并且可能会导致性能降低多达 5% 到 15%(具体取决于平台)。禁用 TSO 可消除 TCP 流量的性能损失,而不会对 CPU 性能产生明显影响。

状态:英特尔® 100/200 系列芯片组 – NoFix

英特尔® 300 系列芯片组 - 固定

所以根本原因似乎是硬件(或可能是网卡固件)错误。驱动程序发现 TX 环形缓冲区的结构已损坏,并假设原因是驱动程序中的故障。但在这种情况下,故障似乎出在网卡本身。

建议的解决方法是禁用tsoNIC 的 TCP 分段卸载功能 ( ):

ethtool -K eth0 tso off

富士通 D3401-H1 似乎配备了 Intel Core i7-6700 处理器,属于 Skylake 一代……所以我预计会配备 Intel 100 系列芯片组。看起来该芯片组没有可用的修复程序,因此您可能需要应用解决方法。

相关内容