dmesg 中的这些错误(与 PCI/NVME 驱动器相关)有什么需要担心的吗?

dmesg 中的这些错误(与 PCI/NVME 驱动器相关)有什么需要担心的吗?

有人知道下面这个错误(dmesg 输出)表示什么吗?我在 Linux 下定期写入 Intel NVME 驱动器(连接到 PCI 卡)时遇到此错误。不确定“无需进一步操作”是否意味着我应该忽略它,或者 PCI 卡只是垃圾。

[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]: It has been corrected by h/w and requires no further action
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]: event severity: corrected
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:  Error 0, type: corrected
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   section_type: PCIe error
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   port_type: 0, PCIe end point
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   version: 3.0
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   command: 0x0506, status: 0x0010
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   device_id: 0000:17:00.0
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   slot: 0
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   secondary_bus: 0x00
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   vendor_id: 0x8086, device_id: 0xf1a6
[Mon Oct  1 13:46:53 2018] {24}[Hardware Error]:   class_code: 020801
[Mon Oct  1 13:46:53 2018] nvme 0000:17:00.0: aer_status: 0x000010c0, aer_mask: 0x00002000
[Mon Oct  1 13:46:53 2018] Bad TLP, Bad DLLP, Replay Timer Timeout
[Mon Oct  1 13:46:53 2018] nvme 0000:17:00.0: aer_layer=Data Link Layer, aer_agent=Transmitter ID
[Mon Oct  1 14:21:56 2018] perf: interrupt took too long (3147 > 3135), lowering kernel.perf_event_max_sample_rate to 63500

答案1

这是 RAS 功能,告诉您有一个错误但已更正。无需针对此特定故障采取进一步措施。有时,较高的更正错误率是故障的早期指标。

合理的反应是介于忽略和丢弃磁盘之间。准备好备用磁盘,验证备份,并检查它是否作为阵列的一部分具有冗余。

相关内容