我在 Linux 上遇到了 MCE 硬件错误。我过去rasdaemon
会记录所有错误。以下是ras-mc-ctl --errors
打印的内容
41 2023-01-03 10:50:51 +0100 error: Corrected error, no action required., CPU 2, bank Load Store Unit (bank=0), mcg mcgstatus=0, mci Error_overflow, mcgcap=0x00000117, status=0xd820000000100015, misc=0xd01b0fff00000000, walltime=0x63b3fa7b, cpu=0x00000001, cpuid=0x00800f11, apicid=0x00000002
42 2023-01-03 10:50:51 +0100 error: Corrected error, no action required., CPU 2, bank Load Store Unit (bank=0), mcg mcgstatus=0, mci Error_overflow, mcgcap=0x00000117, status=0xd820000000100015, misc=0xd01b0fff00000000, walltime=0x63b3fa7b, cpu=0x00000007, cpuid=0x00800f11, apicid=0x00000003
43 2023-01-03 10:56:02 +0100 error: Corrected error, no action required., CPU 2, bank Load Store Unit (bank=0), mcg mcgstatus=0, mci Error_overflow, mcgcap=0x00000117, status=0xd820000000100015, misc=0xd01b0fff00000000, walltime=0x63b3fbb2, cpu=0x00000007, cpuid=0x00800f11, apicid=0x00000003
每 5 分钟就会出现更多类似的错误。我不知道如何读取和解码这些错误。此外,它说no action required
我的电脑正在随机重启。我有 AMD Ryzen 处理器并使用 Ubuntu 最新版本。
值得一提的是,我没有在 BIOS 中更改任何内容,CPU 和 RAM 没有超频,并且 RAM 通过了内存测试。
syslogs
:
jakub-comp kernel: [ 7476.357023] mce: [Hardware Error]: Machine check events logged
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357029] [Hardware Error]: Corrected error, no action required.
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357035] [Hardware Error]: CPU:1 (17:1:1) MC0_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd820000000100015
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357049] [Hardware Error]: IPID: 0x000000b000000000, Syndrome: 0x000000003a034102
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357056] [Hardware Error]: Load Store Unit Ext. Error Code: 16, Level 2 TLB parity error.
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357061] [Hardware Error]: cache level: L1, tx: DATA
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357075] mce: [Hardware Error]: Machine check events logged
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357077] [Hardware Error]: Corrected error, no action required.
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357081] [Hardware Error]: CPU:7 (17:1:1) MC0_STATUS[Over|CE|MiscV|-|-|-|SyndV|-|-|-]: 0xd820000000100015
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357110] [Hardware Error]: IPID: 0x000000b000000000, Syndrome: 0x000000003a034b02
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357118] [Hardware Error]: Load Store Unit Ext. Error Code: 16, Level 2 TLB parity error.
Jan 3 12:24:14 jakub-comp kernel: [ 7476.357124] [Hardware Error]: cache level: L1, tx: DATA