在服务器上执行压力测试时出现此错误,并且已排除了硬件问题的可能性(已更换 OCP 以及与 OCP 电缆、电路板等的整个连接),没有更换 CPU、RAM 或 SSD,因为这不太可能是原因。
设备 ID:0000:64:02.0
Dmesg check............................[FAIL]
[ 250.275668] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 250.275670] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 250.275671] {1}[Hardware Error]: event severity: corrected
[ 250.275672] {1}[Hardware Error]: Error 0, type: corrected
[ 250.275673] {1}[Hardware Error]: section_type: PCIe error
[ 250.275673] {1}[Hardware Error]: port_type: 4, root port
[ 250.275674] {1}[Hardware Error]: version: 3.0
[ 250.275674] {1}[Hardware Error]: command: 0x0547, status: 0x0010
[ 250.275675] {1}[Hardware Error]: device_id: 0000:64:02.0
[ 250.275675] {1}[Hardware Error]: slot: 6
[ 250.275676] {1}[Hardware Error]: secondary_bus: 0x65
[ 250.275676] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x347a
[ 250.275677] {1}[Hardware Error]: class_code: 060400
[ 250.275677] {1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0013
答案1
这可能与 CPU 有关。
错误发生在 vendor_id: 0x8086、device_id: 0x347a 上,即 pci:8086-347a | Intel | Core i9/Xeon PCIe 端口 A
该端口类型也是根端口。
但错误已更正。如果没有其他问题,并且这种情况不常发生。您可以忽略它。或者尝试更换 CPU(或尝试不同的硬件)