识别与 DMESG 的 ECC 错误相关的 RAM 模块

识别与 DMESG 的 ECC 错误相关的 RAM 模块

我的一台服务器正在记录以下 ECC 错误:

    [lun set 14 00:14:16 2020] {33}[Hardware Error]: Hardware error from APEI Generic Hardware Error 
Source: 1
    [lun set 14 00:14:16 2020] {33}[Hardware Error]: It has been corrected by h/w and requires no further action
    [lun set 14 00:14:16 2020] {33}[Hardware Error]: event severity: corrected
    [lun set 14 00:14:16 2020] {33}[Hardware Error]:  Error 0, type: corrected
    [lun set 14 00:14:16 2020] {33}[Hardware Error]:  fru_text: CorrectedErr
    [lun set 14 00:14:16 2020] {33}[Hardware Error]:   section_type: memory error
    [lun set 14 00:14:16 2020] {33}[Hardware Error]:   node: 0 device: 1
    [lun set 14 00:14:16 2020] {33}[Hardware Error]:   error_type: 2, single-bit ECC
    [lun set 14 00:14:16 2020] ghes_edac: Internal error: Can't find EDAC structure

服务器具有以下 RAN 配置:

Handle 0x0029, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: Single-bit ECC
        Maximum Capacity: 64 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x002A, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0029
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16384 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHA3
        Bank Locator: BANK 0
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MHz
        Manufacturer: SK Hynix
        Serial Number: 71929DA0
        Asset Tag: 1651
        Part Number: HMA82GU7MFR8N-TF
        Rank: 2
        Configured Clock Speed: 2133 MHz
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V

Handle 0x002B, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0029
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16384 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHA1
        Bank Locator: BANK 1
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MHz
        Manufacturer: SK Hynix
        Serial Number: 71929CFF
        Asset Tag: 1651
        Part Number: HMA82GU7MFR8N-TF
        Rank: 2
        Configured Clock Speed: 2133 MHz
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V

Handle 0x002C, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0029
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16384 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHB4
        Bank Locator: BANK 2
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MHz
        Manufacturer: SK Hynix
        Serial Number: 71929BB8
        Asset Tag: 1651
        Part Number: HMA82GU7MFR8N-TF
        Rank: 2
        Configured Clock Speed: 2133 MHz
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V

Handle 0x002D, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0029
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 16384 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM CHB2
        Bank Locator: BANK 3
        Type: DDR4
        Type Detail: Synchronous
        Speed: 2133 MHz
        Manufacturer: Samsung
        Serial Number: 33BB5E37
        Asset Tag: 1641
        Part Number: M391A2K43BB1-CPB
        Rank: 2
        Configured Clock Speed: 2133 MHz
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: 1.2 V

如何识别有故障的模块并进行更换?我认为以下日志行包含我需要的信息,但我错过了解密它的方法。

[lun set 14 00:14:16 2020] {33}[Hardware Error]:   node: 0 device: 1

相关内容