一段时间以来,我在使用自己组装的新电脑时遇到了一些问题。我决定选择 Pop!_OS(没有看到专门的 Pop!_OS 论坛,所以我在这里),因为它应该支持很多工程和数据科学相关的软件。此电脑配备 AMD® Ryzen 9 7900x 12 核处理器 × 24 和 NVIDIA GeForce RTX 3060 显卡。
因此,到目前为止,我一直/var/log/kern.log
在后台进行监控,注意到每当我的浏览器或程序崩溃时,它都可能与某些分段错误相关。
以下是导致程序停止的非详尽列表:
chrome[9700]: segfault at 2d9b0303031a ip 000056081a8321db sp 00007ffcb1ad3580 error 4 in chrome[56081916f000+a492000] likely on CPU 17 (core 5, socket 0)
ThreadPoolForeg[12433]: segfault at 43168001 ip 000056081ff5485d sp 00007f73631fbba0 error 4 in chrome[56081916f000+a492000] likely on CPU 20 (core 10, socket 0)
gnome-shell[3019]: segfault at e641f8bf ip 00007f297b1f66d8 sp 00007ffe52033c60 error 6 in libmutter-clutter-10.so.0.0.0[7f297b1e0000+91000] likely on CPU 11 (core 13, socket 0)
Isolated Web Co[5349]: segfault at 8 ip 00007f79f7e1e861 sp 00007ffce6a71a10 error 4 in libxul.so[7f79f40be000+5e78000] likely on CPU 22 (core 12, socket 0)
VirtualBoxVM[5863]: segfault at 10 ip 00007f8cfe6a3b41 sp 00007ffc0591a670 error 6 in libc.so.6[7f8cfe692000+12b000] likely on CPU 19 (core 9, socket 0)
ibus-daemon[3165]: segfault at 20c4 ip 00007f7b14f6269d sp 00007ffc3272dee0 error 4 in libgobject-2.0.so.0.7200.4[7f7b14f48000+33000] likely on CPU 19 (core 9, socket 0)
我在上网和视频通话时最常遇到孤立网络故障。
我真的不知道从哪里开始或如何缩小问题范围。我认为我应该首先关注确定段错误的来源,因为它们一直在发生。任何建议或方向都会有帮助。
更新
我已经调查sudo journalctl -b 0
并检查了最新的启动并已删除我相信的所有错误。
Oct 05 08:31:32 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20230331/dswload2-162)
Oct 05 08:31:32 pop-os kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230331/psobject-220)
Oct 05 08:31:32 pop-os kernel: hub 8-0:1.0: config failed, hub doesn't have any ports! (err -19)
Oct 05 08:31:33 pop-os /usr/bin/nvidia-powerd[1202]: No matching GPU found
Oct 05 08:31:33 pop-os /usr/bin/nvidia-powerd[1202]: Failed to initialize RM Client
Oct 05 08:31:33 pop-os systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Oct 05 08:31:33 pop-os systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.
Oct 05 08:31:33 pop-os systemd[1]: Failed to start nvidia-powerd service.
Oct 05 08:31:34 pop-os vboxdrv.sh[1979]: failed: Look at /var/log/vbox-setup.log to find out what went wrong.
Oct 05 08:31:34 pop-os systemd[1]: vboxdrv.service: Control process exited, code=exited, status=1/FAILURE
Oct 05 08:31:34 pop-os systemd[1]: vboxdrv.service: Failed with result 'exit-code'.
Oct 05 08:31:34 pop-os systemd[1]: Failed to start VirtualBox Linux kernel module.
Oct 05 08:31:35 pop-os gnome-session[2074]: gnome-session-binary[2074]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Oct 05 08:31:35 pop-os gnome-session[2074]: gnome-session-binary[2074]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Oct 05 08:31:35 pop-os gnome-session-binary[2074]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Oct 05 08:31:37 pop-os wpa_supplicant[1247]: bgscan simple: Failed to enable signal strength monitoring
Oct 05 08:40:40 pop-os systemd[3353]: app-gnome-gnome\x2dkeyring\x2dssh-3591.scope: Failed to add PIDs to scope's control group: No such process
Oct 05 08:40:40 pop-os systemd[3353]: app-gnome-gnome\x2dkeyring\x2dssh-3591.scope: Failed with result 'resources'.
Oct 05 08:40:40 pop-os systemd[3353]: Failed to start Application launched by gnome-session-binary.
Oct 05 08:40:42 pop-os gnome-shell[3601]: GNOME Shell started at Thu Oct 05 2023 08:40:41 GMT-0400 (EDT)
Oct 05 08:40:42 pop-os gnome-shell[3601]: Registering session with GDM
Oct 05 08:40:42 pop-os gsd-sharing[2148]: Error releasing name org.gnome.SettingsDaemon.Sharing: The connection is closed
Oct 05 08:40:42 pop-os gsd-rfkill[2159]: Error releasing name org.gnome.SettingsDaemon.Rfkill: The connection is closed
Oct 05 08:40:42 pop-os gnome-session-binary[2074]: GLib-CRITICAL: g_hash_table_foreach_remove_or_steal: assertion 'version == hash_table->version' failed
我在新启动后注意到了其他一些。
Oct 05 22:22:49 pop-os kernel: FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Oct 05 22:22:49 pop-os kernel: FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Oct 05 22:22:49 pop-os kernel: nvidia: module license 'NVIDIA' taints kernel.
Oct 05 22:22:49 pop-os kernel: Disabling lock debugging due to kernel taint
Oct 05 22:22:49 pop-os kernel: nvidia: module license taints kernel.
更新 - Memtest86 失败
嗯,我成功运行了 Memtest86 测试套件,并且已经发现了故障。听起来这实际上是一个硬件问题。
听起来 Memtest86 可以识别是否存在硬件问题,但无法查明哪个设备出现故障。看来我的 1 个或两个 RAM 内存条都有问题。
MemTest86 检测到我的内存中有错误。我的内存有问题吗?
请注意,并非 MemTest86 报告的所有错误都是由于内存故障造成的。该测试隐式测试 CPU、L1 和 L2 缓存以及主板。测试不可能确定导致故障发生的原因。然而,大多数故障都是由于内存模块问题造成的。如果不是这样,唯一的选择就是更换零件,直到故障得到纠正。
我感谢大家的评论。此时我在想我应该一次插入一根 RAM 来重新运行 Memtest 吗?另外,有类似的工具可以用来测试我的CPU吗?
最后,我买这台电脑才几个月。如果我能找出故障硬件,我能让 OEM 更换它吗,还是我只是运气不好?