查找(内存读取)硬件错误的来源

查找(内存读取)硬件错误的来源

登录我的服务器时,我看到很多以下错误:

Message from syslogd@****** at May 31 20:06:59 ...
 kernel:[500570.908383] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1622484419 SOCKET 0 APIC 0 microcode 71a

Message from syslogd@****** at May 31 20:10:11 ...
 kernel:[500762.908155] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: c01d8a8000010091

Message from syslogd@****** at May 31 20:10:11 ...
 kernel:[500762.908278] mce: [Hardware Error]: TSC 0 

Message from syslogd@****** at May 31 20:10:11 ...
 kernel:[500762.908299] mce: [Hardware Error]: PROCESSOR 0:206d7 TIME 1622484611 SOCKET 0 APIC 0 microcode 71a

Message from syslogd@****** at May 31 20:11:10 ...
 kernel:[500821.884806] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: c01ec00000010091

Message from syslogd@****** at May 31 20:11:10 ...
 kernel:[500821.885130] mce: [Hardware Error]: TSC 0 

并且系统日志显示一些内存读取错误:

May 31 20:35:18 ****** kernel: [502269.884160] EDAC sbridge MC0: MISC 20403aba86 
May 31 20:35:18 ****** kernel: [502269.884166] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1622486118 SOCKET 0 APIC 0
May 31 20:35:18 ****** kernel: [502269.884228] EDAC MC0: 16682 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x170c7a offset:0xa00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:1)
May 31 20:35:19 ****** kernel: [502270.908292] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
May 31 20:35:19 ****** kernel: [502270.908349] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 5: cc12b44000010091
May 31 20:35:19 ****** kernel: [502270.908356] EDAC sbridge MC0: TSC 0 
May 31 20:35:19 ****** kernel: [502270.908359] EDAC sbridge MC0: ADDR 3ef245d00 
May 31 20:35:19 ****** kernel: [502270.908363] EDAC sbridge MC0: MISC 20404c4c86 
May 31 20:35:19 ****** kernel: [502270.908366] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1622486119 SOCKET 0 APIC 0
May 31 20:35:19 ****** kernel: [502270.908567] EDAC MC0: 19153 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x3ef245 offset:0xd00 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:4)

看来我的 RAM 模块可能有故障,但 memtest86 显示一切正常。这可能是我的 CPU 的问题吗?

答案1

但 memtest86 显示一切正常。这可能是我的 CPU 的问题吗?

是的,但更有可能的是:您有 ECC 内存并且它可以工作。

基本上,它可以透明地修复单比特错误。它会发出信号,而操作系统会非常聪明地拦截并记录这些错误。

Memtest 对此来说太原始​​了,并且不会拦截通知,它所看到的只是测试通过,因为 ECC 修复了错误。

相关内容