尽管尝试调试运行 Ubuntu 16.04 的新笔记本电脑(KabyLake 架构)的频繁死机问题我偶然发现了这些条目kern.log
:
kernel: [ 0.041634] mce: [Hardware Error]: Machine check events logged
从那时起我已经安装了mcelog
但不知道如何处理日志。内容为/var/log/mcelog
:
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479298799 Wed Nov 16 13:19:59 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479321645 Wed Nov 16 19:40:45 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880000086 ADDR fef1db80
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 13880000086 ADDR fef1dc00
TIME 1479328438 Wed Nov 16 21:33:58 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880000086 ADDR fef1db80
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 13880000086 ADDR fef1dc00
TIME 1479333991 Wed Nov 16 23:06:31 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880000086 ADDR fef1db80
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 13880000086 ADDR fef1dc00
TIME 1479373350 Thu Nov 17 10:02:30 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479373810 Thu Nov 17 10:10:10 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479375712 Thu Nov 17 10:41:52 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479385932 Thu Nov 17 13:32:12 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 3880018086 ADDR fef1cf00
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 43880018086 ADDR fef1ff00
TIME 1479387666 Thu Nov 17 14:01:06 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880000086 ADDR fef1db80
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 13880000086 ADDR fef1dc00
TIME 1479456710 Fri Nov 18 09:11:50 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 43880000086 ADDR fef1db80
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
mcelog: Family 6 Model 8e CPU: only decoding architectural errors
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 13880000086 ADDR fef1dc00
TIME 1479459374 Fri Nov 18 09:56:14 2016
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee2000000040110a MCGSTATUS 0
MCGCAP c08 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 142
一些观察(如有错误,请指正):
- 几乎所有错误似乎都发生在同一页面上 (
ADDR fef1xxx
) - 似乎只有银行 6 和 7 受到影响。
- 所有条目均包含“错误溢出”和“未更正的错误”。
这mcelog常见问题解答提到“预计内存错误的纠正率较低,不需要更换硬件或其他操作”。日志条目包含短语“未更正的错误”,这表明我实际上应该采取一些措施。
我的问题是:
- 这些错误意味着什么?我应该担心它们吗?
- 这些硬件错误是否会导致整个系统死机?
- 我应该让制造商更换笔记本电脑(或部件)吗?
- 我还应该采取其他行动吗?
答案1
首先,我担心我无法真正很好地回答你的问题。我还拥有一台 Dell XPS 13 (9360),并看到相同的 MCE 消息。为此,我与戴尔支持人员联系。他们更换了主板,但没有帮助。日志中出现相同的消息。在某种程度上,他们得出结论,这可能是误报。但他们不知道是什么原因造成的(mcelog/内核/英特尔问题?)。与支持人员的通信仍在进行中。
<rant>
顺便说一句,与戴尔支持交谈是一次非常不愉快的经历。他们似乎只建议“标准”解决方案,例如重置固件、运行自我健康测试等。我没有与具有一定技术洞察力的人交谈的印象。
</rant>
添加更多详细信息,我在 Fedora 24 上看到了同样的问题,因此它似乎与 Ubuntu 无关。
关于您的问题:
这些错误意味着什么?我应该担心它们吗?
我不知道。戴尔支持认为这些都是误报。
这些硬件错误是否会导致整个系统死机?
除了消息之外,我的系统运行良好。我猜冻结是另一个问题。
我应该让制造商更换笔记本电脑(或部件)吗?
更换主板并不能解决 MCE 问题。它可能会解决冻结问题,尽管这似乎是通过内核更新修复。
我还应该采取其他行动吗?
如果您尚未联系支持人员,请联系他们。一旦他们看到它影响更多的客户,也许他们会想出一个真正的解决方案。