这是我第一次在我的机器上遇到 Ubuntu 问题,最近我将我的电脑的 SSD 换成了全新的,它在 Windows 下运行得很好,而且固件也是最新的。
硬件
- 金士顿 A200 NVME 500Gb(BTRS 和 XFS)
- 混合显卡(Intel HD 530、NVIDIA GeForce GTX 950M)
软件
- Nvidia 驱动程序 440(来自官方存储库,Prime Profile:按需)
- Cuda 驱动程序(来自官方存储库)
- Linux 内核 5.4.0-42-generic(已启用安全启动)
有时,我正在使用笔记本电脑,但 Kwin 停止工作,我无法打开应用程序启动器,但我可以通过 Alt + Tab 键更改窗口,但几秒钟后,屏幕完全冻结,我无法控制鼠标,温度开始升高,我无法切换到另一个控制台来检查错误(Control + Alt + F2),我只能使用 Magic SysRq 键 + REISUB 重新启动我的电脑。
我的系统的相关信息:
BIOS 版本
sudo dmidecode -s bios-version
E5CN63WW
RAM 和 SWAP 数据:
free -h
total used free shared buff/cache available
Mem: 15Gi 3,9Gi 7,0Gi 1,3Gi 4,5Gi 10Gi
Swap: 3,8Gi 1,8Gi 2,0Gi
Swapiness
sysctl vm.swappiness
vm.swappiness = 60
系统日志journalctl -k -b -1
(对我来说)没有显示任何相关信息,但我将以下带有警告或警报的消息附加在上面,以防我忘记某些内容
第一份日志
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 125: no longer affine to CPU1
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 140: no longer affine to CPU4
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 124: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 128: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 138: no longer affine to CPU7
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI: button: The lid device is not compliant to SW_LID.
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 20:49:23 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 20:49:29 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 43 callbacks suppressed
aug 11 20:49:55 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at /run/systemd/unit-root/var/cache/private/fwupdmgr supports timestamps until 2038 (0x7fffffff)
第二份日志
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: [Firmware Bug]: TPM Final Events table missing or invalid
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: #5 #6 #7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PPC], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PCT], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.LPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.TPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.PSDF], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.HPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.SPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi MSFT0101:00: platform device creation failed: -16
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: usb: port power management may be unreliable
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvme nvme0: missing or invalid SUBNQN field.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: asus_wmi: ASUS Management GUID not found
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Realtek Extended Controls Unit was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Extension 4 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Processing 2 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Camera 1 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: loading out-of-tree module taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: module license 'NVIDIA' taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Disabling lock debugging due to kernel taint
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.100 Fri May 29 08:45:51 UTC 2020
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
第三个日志
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.100 Fri May 29 08:45:51 UTC 2020
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
aug 11 22:21:25 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 22:21:26 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 22:22:31 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 37 callbacks suppressed
更新
我重新硬安装了带有 EXT4 分区的 Kubuntu 20.04.1,似乎是 SSD 错误,新信息如下:
- nvme0n1p5 /分区
- nvme0n1p4 /home 分区 当我使用我的电脑时,它会随机发生,并且计算机完全冻结。
[ 3378.408344] systemd-journald (423): Failed to write entry (22 items, 780 bytes), ignoring: Read-only
[ 3378.408611] systemd-journald [423] : Failed to write entry (22 items, 769 bytes), ignoring: Read-only
另一个有关冻结错误的日志。
[ 827214225 EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: comm gmain: reading directory lblock 0
[ 827.214749] EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: conn gmain: reading directory lblock 0
[ 827.214764] EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: comm gmain: reading directory lblock 0
有时当我关闭笔记本电脑时,会出现此错误
[ 16918.166564] systemd-shutdown [1]: Remounting '/' timed out. issuing SIGKILL to PID 11240.
[ 16982.141788] nvme nvme0: Device not ready: aborting reset
[ 16982.143784] nvme : Removing after probe failure status: -19
更新 2
使用 Kubuntu Live ISO,我执行了 fsck 测试,没有发现问题。
root@kubuntu:/home/kubuntu# fsck /dev/nvme0n1p3
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/nvme0n1p3: clean, 257827/6111232 files, 8741020/24413952 blocks
root@kubuntu:/home/kubuntu# echo $?
0
root@kubuntu:/home/kubuntu# fsck /dev/nvme0n1p5
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/nvme0n1p5: clean, 754959/6447104 files, 10749435/25785856 blocks
root@kubuntu:/home/kubuntu# echo $?
0
重启时出现问题
nvme nvme0: Device not ready; aborting reset
nvme nvme0: Abort status: 0x371
nvme nvme0: Abort status: 0x371
nvme nvme0: Abort status: 0x371
Remounting '/' timed out, issuing SIGKILL to PID 7544.
SMART分析如下:
sudo smartctl -i /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SA2000M8500G
Serial Number: 50026B7683BC98CE
Firmware Version: S5Z42105
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500.107.862.016 [500 GB]
Namespace 1 Utilization: 142.133.460.992 [142 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 683bc98ce5
Local Time is: Wed Aug 26 23:49:45 2020 CEST
sudo smartctl -a /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: KINGSTON SA2000M8500G
Serial Number: 50026B7683BC98CE
Firmware Version: S5Z42105
PCI Vendor/Subsystem ID: 0x2646
IEEE OUI Identifier: 0x0026b7
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500.107.862.016 [500 GB]
Namespace 1 Utilization: 142.114.676.736 [142 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 0026b7 683bc98ce5
Local Time is: Wed Aug 26 23:51:50 2020 CEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W - - 0 0 0 0 0 0
1 + 4.60W - - 1 1 1 1 0 0
2 + 3.80W - - 2 2 2 2 0 0
3 - 0.0450W - - 3 3 3 3 2000 2000
4 - 0.0040W - - 4 4 4 4 15000 15000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 30 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 3.966.522 [2,03 TB]
Data Units Written: 6.036.943 [3,09 TB]
Host Read Commands: 38.899.250
Host Write Commands: 46.064.389
Controller Busy Time: 601
Power Cycles: 390
Power On Hours: 241
Unsafe Shutdowns: 160
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 7
Thermal Temp. 1 Total Time: 24
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged
谢谢阅读。我做错了什么?任何评论都非常感谢!
问候
答案1
问题出在 SSD 功能上,自主电源状态转换 (APST) 导致冻结。为了缓解此问题,在他们发布修复程序之前,请nvme_core.default_ps_max_latency_us=0
在GRUB_CMDLINE_LINUX_DEFAULT
选项中包含此行。例如:
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
GRUB_CMDLINE_LINUX=""
答案2
BIOS
您的 BIOS 当前版本为 E5CN63WW。
骨髓增生异常综合征
您有 MDS 和 TAA 错误:
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
内核命令行上的缓解控制
内核命令行允许使用选项“mds=”在启动时控制 MDS 缓解措施。此选项的有效参数为:
full
如果 CPU 存在漏洞,请启用所有可用的 MDS 漏洞缓解措施,在退出用户空间和进入虚拟机时清除 CPU 缓冲区。如果启用了 SMT,空闲转换也会受到保护。
它不会自动禁用 SMT。
full,nosmt
与 mds=full 相同,在易受攻击的 CPU 上禁用 SMT。这是完整的缓解措施。
off
完全禁用 MDS 缓解措施。
sudo -H gedit /etc/default/grub
改变:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
到:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mds=full,nosmt"
保存文件并退出gedit
。
sudo update-grub
reboot
注意:请理解,在多 CPU 或多核配置下,性能会受到巨大影响。
注意:如果性能下降太大,请mds=full
尝试mds=full,nosmt
。
NVMe
Kingston A200 NVME 500Gb
您可能遇到了固件问题:
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvme nvme0: missing or invalid SUBNQN field.
请访问制造商的网站并检查是否有更新的固件。
可信平台管理
您遇到 TPM 错误:
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi MSFT0101:00: platform device creation failed: -16
检查以确保您Software Updates
是最新的,并且您正在运行最新的内核。
检查 BIOS 中的 TPM 设置,并尽可能禁用 TPM。
记忆
您的 swap 和 vm.swappiness 设置看起来不错。
去https://www.memtest86.com/并免费下载/运行它们memtest
来测试你的记忆力。至少完成一次所有 4/4 测试以确认记忆力良好。这可能需要几个小时才能完成。
英伟达
您使用的是 Nvidia 驱动程序 440。现在有更新的版本 450.57,您可以下载这里。