不幸的是,运行 Fedora Server 34 的 HP ProLiant DL380e G8 服务器出现了问题。我怀疑是内存错误或 DIMM 出现故障,但我不确定。
欢迎反馈!
我已经运行了journalctl -r
,它在 PasteBin 链接中返回以下输出(一段看起来不寻常的片段):https://pastebin.com/KPUZHceD
感谢所有帮助和想法!
亲切的问候
编辑:回应@Michael Hampton 的评论:输出发布在这里:
<27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1304]: <27>Sep 7 17:03:51 mcelog: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1304]: Location: SOCKET:0 CHANNEL:3 DIMM:1 []
Sep 07 17:03:51 turbo mcelog[1303]: <27>Sep 7 17:03:51 mcelog: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1303]: corrected DIMM memory error count exceeded threshold: 10 in 24h
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 2 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 1 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 7
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 3 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 13 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 6
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 5
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: Running trigger `dimm-error-trigger' (reporter: memdb)
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 6 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c80000c400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d22131295c834800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 3 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 4
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID a SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801c00400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213fa689118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 5 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 3
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 5 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bd8400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d2213f0649118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 14 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 2
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 1 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c801bec400800093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: MemCtrl:
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: MCi_MISC register valid
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: MISC d221196e09118800
Sep 07 17:03:51 turbo mcelog[1067]: CPU 12 BANK 11
Sep 07 17:03:51 turbo mcelog[1067]: MCE 1
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: CPUID Vendor Intel Family 6 Model 45 Step 7
Sep 07 17:03:51 turbo mcelog[1067]: MICROCODE 71a
Sep 07 17:03:51 turbo mcelog[1067]: MCGCAP 1000812 APICID 0 SOCKETID 0
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: STATUS c0107b4000010093 MCGSTATUS 0
Sep 07 17:03:51 turbo mcelog[1067]: Transaction: Memory read error
Sep 07 17:03:51 turbo mcelog[1067]: MCA: MEMORY CONTROLLER RD_CHANNEL3_ERR
Sep 07 17:03:51 turbo mcelog[1067]: Corrected error
Sep 07 17:03:51 turbo mcelog[1067]: Error overflow
Sep 07 17:03:51 turbo mcelog[1067]: MCi status:
Sep 07 17:03:51 turbo mcelog[1067]: MCG status:
Sep 07 17:03:51 turbo mcelog[1067]: TIME 1631027031 Tue Sep 7 17:03:51 2021
Sep 07 17:03:51 turbo mcelog[1067]: CPU 0 BANK 5
Sep 07 17:03:51 turbo mcelog[1067]: MCE 0
Sep 07 17:03:51 turbo mcelog[1067]: Hardware event. This is not a software error.
Sep 07 17:03:51 turbo mcelog[1067]: mcelog: mcelog read: Input/output error
Sep 07 17:03:51 turbo kernel: ERST: [Firmware Warn]: Firmware does not respond in time.
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce: [Hardware Error]: Machine check events logged
Sep 07 17:03:51 turbo kernel: mce_notify_irq: 6 callbacks suppressed
答案1
已通过从服务器上移除 2 条有故障的 RAM 条并重新安装 CPU 来修复此帖子,因为这也没有良好接触。
谢谢大家的帮助!