我的一台服务器正在记录以下 ECC 错误:
[lun set 14 00:14:16 2020] {33}[Hardware Error]: Hardware error from APEI Generic Hardware Error
Source: 1
[lun set 14 00:14:16 2020] {33}[Hardware Error]: It has been corrected by h/w and requires no further action
[lun set 14 00:14:16 2020] {33}[Hardware Error]: event severity: corrected
[lun set 14 00:14:16 2020] {33}[Hardware Error]: Error 0, type: corrected
[lun set 14 00:14:16 2020] {33}[Hardware Error]: fru_text: CorrectedErr
[lun set 14 00:14:16 2020] {33}[Hardware Error]: section_type: memory error
[lun set 14 00:14:16 2020] {33}[Hardware Error]: node: 0 device: 1
[lun set 14 00:14:16 2020] {33}[Hardware Error]: error_type: 2, single-bit ECC
[lun set 14 00:14:16 2020] ghes_edac: Internal error: Can't find EDAC structure
服务器具有以下 RAN 配置:
Handle 0x0029, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Single-bit ECC
Maximum Capacity: 64 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x002A, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM CHA3
Bank Locator: BANK 0
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: SK Hynix
Serial Number: 71929DA0
Asset Tag: 1651
Part Number: HMA82GU7MFR8N-TF
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: 1.2 V
Handle 0x002B, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM CHA1
Bank Locator: BANK 1
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: SK Hynix
Serial Number: 71929CFF
Asset Tag: 1651
Part Number: HMA82GU7MFR8N-TF
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: 1.2 V
Handle 0x002C, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM CHB4
Bank Locator: BANK 2
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: SK Hynix
Serial Number: 71929BB8
Asset Tag: 1651
Part Number: HMA82GU7MFR8N-TF
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: 1.2 V
Handle 0x002D, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: None
Locator: DIMM CHB2
Bank Locator: BANK 3
Type: DDR4
Type Detail: Synchronous
Speed: 2133 MHz
Manufacturer: Samsung
Serial Number: 33BB5E37
Asset Tag: 1641
Part Number: M391A2K43BB1-CPB
Rank: 2
Configured Clock Speed: 2133 MHz
Minimum Voltage: Unknown
Maximum Voltage: Unknown
Configured Voltage: 1.2 V
如何识别有故障的模块并进行更换?我认为以下日志行包含我需要的信息,但我错过了解密它的方法。
[lun set 14 00:14:16 2020] {33}[Hardware Error]: node: 0 device: 1