e1000e 驱动程序和 preempt_RT 内核:意外重置适配器;检测到硬件单元挂起

e1000e 驱动程序和 preempt_RT 内核:意外重置适配器;检测到硬件单元挂起

除了已经存在的 2 个控制器之外,我还尝试使用安装在主板上的 2 个以太网控制器。

当我使用 ubuntu 提供的普通内核时,一切都正常。当我使用 ubuntu 的 PREEMPT_RT 修补内核时,后者会在附加卡连接后立即循环重新启动。

$dmesg
[66058.590276] e1000e 0000:06:00.0 enp6s0: Detected Hardware Unit Hang:
[66058.590276] TDH <0>
[66058.590276] TDT <9>
[66058.590276] next_to_use <9>
[66058.590276] next_to_clean <0>
[66058.590276] buffer_info[next_to_clean]:
[66058.590276] time_stamp <100facc70>
[66058.590276] next_to_watch <0>
[66058.590276] jiffies <100fad6b0>
[66058.590276] next_to_watch.status <0>
[66058.590276] MAC Status <80783>
[66058.590276] PHY Status <796d>
[66058.590276] PHY 1000BASE-T Status <7c00>
[66058.590276] PHY Extended Status <3000>
[66058.590276] PCI Status <10>
[66058.663912] e1000e 0000:06:00.0 enp6s0: Reset adapter unexpectedly
$ lspci -PP -nn
00:00.0 Host bridge [0600]: Intel Corporation 3rd Gen Core processor DRAM Controller [8086:0154] (rev 09)
00:02.0 VGA compatible controller [0300]: Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0156] (rev 09)
00:16.0 Communication controller [0780]: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 [8086:1e3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 [8086:1e2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 [8086:1e10] (rev c4)
00:1c.1 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 [8086:1e12] (rev c4)
00:1c.2 PCI bridge [0604]: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 [8086:1e14] (rev c4)
00:1d.0 USB controller [0c03]: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 [8086:1e26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation NM70 Express Chipset LPC Controller [8086:1e5f] (rev 04)
00:1f.2 IDE interface [0101]: Intel Corporation 7 Series Chipset Family 4-port SATA Controller [IDE mode] [8086:1e01] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller [8086:1e22] (rev 04)
00:1f.5 IDE interface [0101]: Intel Corporation 7 Series Chipset Family 2-port SATA Controller [IDE mode] [8086:1e09] (rev 04)
00:1c.0/01:00.0 Ethernet controller [0200]: Intel Corporation 82583V Gigabit Network Connection [8086:150c]
00:1c.1/02:00.0 Ethernet controller [0200]: Intel Corporation 82583V Gigabit Network Connection [8086:150c]
00:1c.2/03:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05)
00:1c.2/03:00.0/04:01.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05)
00:1c.2/03:00.0/04:02.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05)
00:1c.2/03:00.0/04:01.0/05:00.0 Ethernet controller [0200]: Intel Corporation 82583V Gigabit Network Connection [8086:150c]
00:1c.2/03:00.0/04:02.0/06:00.0 Ethernet controller [0200]: Intel Corporation 82583V Gigabit Network Connection [8086:150c]
$ uname -srvm
Linux 5.15.0-1050-realtime #56-Ubuntu SMP PREEMPT_RT Fri Oct 6 17:11:41 UTC 2023 x86_64

4 个以太网控制器相同(Intel 82583V),只有附加卡重新启动。我尝试使用pcie_aspm=off e1000e.SmartPowerDownEnable=0 e1000e.EEE=0作为内核参数,它不起作用。

# ethtool --show-eee enp6s0
netlink error: Operation not supported

我也尝试过这个修复来自 sourceforge 上的 e1000 驱动程序项目

现在,我还不清楚问题出在哪里。我刚刚有一位客人咨询了 pci 内部接线、延迟和实时问题。(2 个故障控制器级联连接到 2 个 Pci 桥)

$ lspci -t -nn -v
-[0000:00]-+-00.0  Intel Corporation 3rd Gen Core processor DRAM Controller [8086:0154]
           +-02.0  Intel Corporation 3rd Gen Core processor Graphics Controller [8086:0156]
           +-16.0  Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 [8086:1e3a]
           +-1a.0  Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 [8086:1e2d]
           +-1b.0  Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller [8086:1e20]
           +-1c.0-[01]----00.0  Intel Corporation 82583V Gigabit Network Connection [8086:150c]
           +-1c.1-[02]----00.0  Intel Corporation 82583V Gigabit Network Connection [8086:150c]
           +-1c.2-[03-06]----00.0-[04-06]--+-01.0-[05]----00.0  Intel Corporation 82583V Gigabit Network Connection [8086:150c]
           |                               \-02.0-[06]----00.0  Intel Corporation 82583V Gigabit Network Connection [8086:150c]
           +-1d.0  Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 [8086:1e26]
           +-1f.0  Intel Corporation NM70 Express Chipset LPC Controller [8086:1e5f]
           +-1f.2  Intel Corporation 7 Series Chipset Family 4-port SATA Controller [IDE mode] [8086:1e01]
           +-1f.3  Intel Corporation 7 Series/C216 Chipset Family SMBus Controller [8086:1e22]
           \-1f.5  Intel Corporation 7 Series Chipset Family 2-port SATA Controller [IDE mode] [8086:1e09]

有人能给出解释、提供线索或提供任何文件来阻止这种永恒的重置吗?

相关内容