PCI OCP 上的 Linux 服务器上的 APEI 通用硬件错误

PCI OCP 上的 Linux 服务器上的 APEI 通用硬件错误

在服务器上执行压力测试时出现此错误,并且已排除了硬件问题的可能性(已更换 OCP 以及与 OCP 电缆、电路板等的整个连接),没有更换 CPU、RAM 或 SSD,因为这不太可能是原因。

设备 ID:0000:64:02.0

    Dmesg check............................[FAIL]
[  250.275668] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  250.275670] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[  250.275671] {1}[Hardware Error]: event severity: corrected
[  250.275672] {1}[Hardware Error]:  Error 0, type: corrected
[  250.275673] {1}[Hardware Error]:   section_type: PCIe error
[  250.275673] {1}[Hardware Error]:   port_type: 4, root port
[  250.275674] {1}[Hardware Error]:   version: 3.0
[  250.275674] {1}[Hardware Error]:   command: 0x0547, status: 0x0010
[  250.275675] {1}[Hardware Error]:   device_id: 0000:64:02.0
[  250.275675] {1}[Hardware Error]:   slot: 6
[  250.275676] {1}[Hardware Error]:   secondary_bus: 0x65
[  250.275676] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x347a
[  250.275677] {1}[Hardware Error]:   class_code: 060400
[  250.275677] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0013

答案1

这可能与 CPU 有关。

错误发生在 vendor_id: 0x8086、device_id: 0x347a 上,即 pci:8086-347a | Intel | Core i9/Xeon PCIe 端口 A

该端口类型也是根端口。

但错误已更正。如果没有其他问题,并且这种情况不常发生。您可以忽略它。或者尝试更换 CPU(或尝试不同的硬件)

相关内容