了解机器检查异常 (MCE)

了解机器检查异常 (MCE)

尽管尝试调试运行 Ubuntu 16.04 的新笔记本电脑(KabyLake 架构)的频繁死机问题我偶然发现了这些条目kern.log

kernel: [    0.041634] mce: [Hardware Error]: Machine check events logged

从那时起我已经安装了mcelog但不知道如何处理日志。内容为/var/log/mcelog

mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 3880018086 ADDR fef1cf00 
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 43880018086 ADDR fef1ff00 
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6 
MISC 43880000086 ADDR fef1db80 
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7 
MISC 13880000086 ADDR fef1dc00 
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 142

一些观察(如有错误,请指正):

  • 几乎所有错误似乎都发生在同一页面上 ( ADDR fef1xxx)
  • 似乎只有银行 6 和 7 受到影响。
  • 所有条目均包含“错误溢出”和“未更正的错误”。

mcelog常见问题解答提到“预计内存错误的纠正率较低,不需要更换硬件或其他操作”。日志条目包含短语“未更正的错误”,这表明我实际上应该采取一些措施。

我的问题是:

  1. 这些错误意味着什么?我应该担心它们吗?
  2. 这些硬件错误是否会导致整个系统死机?
  3. 我应该让制造商更换笔记本电脑(或部件)吗?
  4. 我还应该采取其他行动吗?

答案1

首先,我担心我无法真正很好地回答你的问题。我还拥有一台 Dell XPS 13 (9360),并看到相同的 MCE 消息。为此,我与戴尔支持人员联系。他们更换了主板,但没有帮助。日志中出现相同的消息。在某种程度上,他们得出结论,这可能是误报。但他们不知道是什么原因造成的(mcelog/内核/英特尔问题?)。与支持人员的通信仍在进行中。

<rant> 顺便说一句,与戴尔支持交谈是一次非常不愉快的经历。他们似乎只建议“标准”解决方案,例如重置固件、运行自我健康测试等。我没有与具有一定技术洞察力的人交谈的印象。 </rant>

添加更多详细信息,我在 Fedora 24 上看到了同样的问题,因此它似乎与 Ubuntu 无关。

关于您的问题:

这些错误意味着什么?我应该担心它们吗?

我不知道。戴尔支持认为这些都是误报。

这些硬件错误是否会导致整个系统死机?

除了消息之外,我的系统运行良好。我猜冻结是另一个问题。

我应该让制造商更换笔记本电脑(或部件)吗?

更换主板并不能解决 MCE 问题。它可能会解决冻结问题,尽管这似乎是通过内核更新修复

我还应该采取其他行动吗?

如果您尚未联系支持人员,请联系他们。一旦他们看到它影响更多的客户,也许他们会想出一个真正的解决方案。

答案2

在此输入图像描述

我遇到了相同的 mce 错误,在最近几次内核更新 (Fedora 25) 的启动时开始弹出,但我不知道这是哪个确切的更新开始出现的。笔记本电脑是DELL Inspiron 5567(Intel i5 7200U)。然而,系统在启动后工作得很好,所以我 100% 确定这是由于某种原因出现的假阳性。

相关内容