我每天都会在 Linux 机器上多次收到以下 ecc 错误 -
May 24 18:21:04 staton-nas kernel: mce: [Hardware Error]: Machine check events logged
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 11: 8c000040000800c2
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: TSC 1c35588953416
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: ADDR 117d228000
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: MISC 122100200020008c
May 24 18:21:04 staton-nas kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1590358864 SOCKET 0 APIC 0
May 24 18:21:04 staton-nas kernel: EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x117d228 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c2 socket:0 ha:0 channel_mask:1 rank:4)
addr 始终相同,因此我尝试使用“memmap=5M$0x117CFA8001”内核参数来映射它。
这个论点似乎适用,因为我在系统日志中看到以下内容 -
May 24 16:03:09 staton-nas kernel: user: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
May 24 16:03:09 staton-nas kernel: user: [mem 0x0000000100000000-0x000000117cfa8000] usable
May 24 16:03:09 staton-nas kernel: user: [mem 0x000000117cfa8001-0x000000117d4a8000] reserved
May 24 16:03:09 staton-nas kernel: user: [mem 0x000000117d4a8001-0x000000407fffffff] usable
但我仍然收到 ecc 错误。
我错过了什么吗?
edac 系统日志错误中的“ADDR 117d228000”不是我需要映射的实际地址吗?我需要以某种方式将其转换为物理地址吗?
我太便宜了,无法为一个坏点更换整个调光器。
我做的研究越多,我就越确信“内存清理错误”消息表明错误来自硬件正在执行的内存清理。既然我已经围绕它绘制了地图,我就可以放心地忽略它了。操作系统永远不会真正使用这个内存区域,因为我保留了它。
谁能证实这一点吗?