Ubuntu 20.04 在不同时间随机崩溃。无法指向特定事件。
uname -a
Linux ubuntu 5.11.0-051100-generic #202102142330
SMP Sun Feb 14 23:33:21 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
崩溃并发出以下信号:
kernel:[19849.215258] [Hardware Error]: Uncorrected, software restartable error.
kernel:[19849.215259] [Hardware Error]: CPU:22 (19:21:0) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|-|Poison|-]: 0xbc00080001010135
kernel:[19849.215263] [Hardware Error]: Error Addr: 0x000000076bed1c00
kernel:[19849.215264] [Hardware Error]: IPID: 0x001000b000000000
kernel:[19849.215266] [Hardware Error]: Load Store Unit Ext. Error Code: 1, An ECC error or L2 poison was detected on a data cache read by a load.
kernel:[19849.215269] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
硬件信息:
### CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 33
Model name: AMD Ryzen 9 5900X 12-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 2200.000
CPU max MHz: 6442.4800
CPU min MHz: 2200.0000
### Base Board Information
Manufacturer: ASRock
Product Name: X570 Taichi
### Memory:
G Skill Trident Z Neo DDR4 - 3600Mhz 32GB (2 x 16GB)
有哪些建议的方法来找出根本原因?如何启用更多日志记录,或者如果日志已经存在,我可以在哪里找到它们等等。任何指导都将不胜感激。谢谢!
答案1
这不是技术上一个答案,但是......
该ECC error or L2 poison was detected on a data cache read by a load
消息表明内存存在问题,可能是 RAM 本身的问题,也可能是 CPU 上的缓存问题。这两种情况都不是很好,但你可以按照以下步骤测试系统 RAM:
- 重新启动系统
- 按住该Shift键调出 GRUB 菜单
- 选择“Ubuntu,memtest86+”并按下Enter
内存测试将运行,直到时间结束或直到您按下 键Esc。让机器完成至少一次测试后再退出。
基于报告 大约网络上,这个问题似乎只出现在高端 AMD Ryzen 处理器上。阅读这条长线AMD 社区网站上透露了这个有趣的信息:
我更换了内存,现在电脑已经稳定运行了几天。希望这对你也有帮助,就像它帮助了我一样。以前的内存是 Gskill 3600mhz 内存……新内存是 Corsair 的 3200 内存。
您的问题没有说明您安装了哪种内存,但如果是一组频率更高的模块,则 RAM 和 CPU 之间可能存在导致不稳定的因素。如果内存测试失败,而您恰好有一些兼容的 3200MHz RAM 可用(即使只有一个 DIMM),请考虑将其换出并再次执行内存测试。
答案2
BIOS
华擎 X570 Taichi
BIOS 当前版本为 P4.30。
记忆
G Skill Trident Z Neo DDR4 - 3600Mhz 32GB (2 x 16GB),产品:F4-3600C16-16GTZNC
AMD Ryzen 9 5900X 12 核处理器
Ryzen 处理器对 RAM 非常挑剔。
这些 DIMM 未出现在内存支持列表中,如下所示这里。
memtest
通过了所有测试。
当我们查看时,sudo lshw -C memory
我们发现 DIMM可能安装在错误的插槽位置。使用 2 个大小相同的 DIMM 时,应将它们安装在插槽 A2 和 B2 中。这是主板布局和内存插槽的图像...取自用户手册这里...所以只需验证一下...
答案3
根据@heynnema的建议,我发现安装在我电脑上的DIMM型号没有列在兼容性列表中。以下是以下步骤:
- 访问 CPU 支持列表ASRock x570 Taichi 网站。找出核心类型。我的情况是
Vermeer
sudo lshw -C memory
通过运行(它是F4-3600C16-16GTZNC
)找出系统上安装的 DIMM 的型号- 导航至内存支持列表针对 Vermeer 并查看它是否受支持。不幸的是,它不在列表中!也许这就是导致不一致崩溃的原因。我将尝试受支持的 DIMM 版本,看看崩溃是否再次发生,并相应地更新此答案。
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: P4.30
date: 04/14/2021
size: 64KiB
capacity: 16MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: e
slot: System board or motherboard
size: 32GiB
*-bank:0
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
product: F4-3600C16-16GTZNC
vendor: Unknown
physical id: 0
serial: 00000000
slot: DIMM 0
size: 16GiB
width: 64 bits
clock: 2133MHz (0.5ns)
*-bank:1
description: Project-Id-Version: lshwReport-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>PO-Revision-Date: 2012-02-02 13:04+0000Last-Translator: Joel Addison <[email protected]>Language-Team: English (Australia) <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bitX-Launchpad-Export-Date: 2021-01-21 18:43+0000X-Generator: Launchpad (build 2d1d5e352f0d063d660df2300e31f66bed027fa5)Project-Id-Version: lshwReport-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>PO-Revision-Date: 2012-02-02 13:04+0000Last-Translator: Joel Addison <[email protected]>Language-Team: English (Australia) <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bitX-Launchpad-Export-Date: 2021-01-21 18:43+0000X-Generator: Launchpad (build 2d1d5e352f0d063d660df2300e31f66bed027fa5) [empty]
product: Unknown
vendor: Unknown
physical id: 1
serial: Unknown
slot: DIMM 1
*-bank:2
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2133 MHz (0.5 ns)
product: F4-3600C16-16GTZNC
vendor: Unknown
physical id: 2
serial: 00000000
slot: DIMM 0
size: 16GiB
width: 64 bits
clock: 2133MHz (0.5ns)
*-bank:3
description: Project-Id-Version: lshwReport-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>PO-Revision-Date: 2012-02-02 13:04+0000Last-Translator: Joel Addison <[email protected]>Language-Team: English (Australia) <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bitX-Launchpad-Export-Date: 2021-01-21 18:43+0000X-Generator: Launchpad (build 2d1d5e352f0d063d660df2300e31f66bed027fa5)Project-Id-Version: lshwReport-Msgid-Bugs-To: FULL NAME <EMAIL@ADDRESS>PO-Revision-Date: 2012-02-02 13:04+0000Last-Translator: Joel Addison <[email protected]>Language-Team: English (Australia) <[email protected]>MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bitX-Launchpad-Export-Date: 2021-01-21 18:43+0000X-Generator: Launchpad (build 2d1d5e352f0d063d660df2300e31f66bed027fa5) [empty]
product: Unknown
vendor: Unknown
physical id: 3
serial: Unknown
slot: DIMM 1
*-cache:0
description: L1 cache
physical id: 11
slot: L1 - Cache
size: 768KiB
capacity: 768KiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 12
slot: L2 - Cache
size: 6MiB
capacity: 6MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 13
slot: L3 - Cache
size: 64MiB
capacity: 64MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=3