我有 Ubuntu 22.04,从 20.04 升级而来,安装于 2020 年。此机器]1在内核 5.15.0-43 下运行良好。在最近更新(2023.11.28)后,它经常因 I/O 错误而随机崩溃。崩溃时它不会写入日志文件,因为 ssd 不可写(!)。这应该是固件或内核错误,因为内核 5.15.0-43 的 Ubuntu 22.04 可以运行来自 pendrive。Windows 11 在其分区中也能很好地工作。我做到了没有发现任何硬件问题. 根目录 / 安装在/dev/nvme0n1p5并且 /home 位于/dev/nvme0n1p6 我需要一些固件或内核程序员的帮助如何调试这个。有人能帮我解决这个问题吗?
以及如下错误:
$ cat /etc/*release
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ udisksctl 状态
$ udisksctl status
MODEL REVISION SERIAL DEVICE
--------------------------------------------------------------------------
INTEL SSDPEKNW512G8 004C BTNH05020T67512A nvme0n1
$ df -T
$ df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
tmpfs tmpfs 780904 2232 778672 1% /run
/dev/nvme0n1p5 ext4 50080992 28573776 18930800 61% /
tmpfs tmpfs 3904504 168276 3736228 5% /dev/shm
tmpfs tmpfs 5120 4 5116 1% /run/lock
/dev/nvme0n1p6 ext4 334721912 207130268 110515420 66% /home
/dev/nvme0n1p1 vfat 262144 53548 208596 21% /boot/efi
tmpfs tmpfs 780900 112 780788 1% /run/user/1000
sudo parted -l
$ sudo parted -l
Model: INTEL SSDPEKNW512G8 (nvme)
Disk /dev/nvme0n1: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 274MB 273MB fat32 EFI system partition boot, esp
2 274MB 290MB 16,8MB Microsoft reserved partition msftres
3 290MB 90,3GB 90,0GB Basic data partition msftdata
4 90,3GB 91,4GB 1156MB ntfs hidden, diag
5 91,4GB 144GB 52,4GB ext4
6 144GB 493GB 349GB ext4
7 493GB 511GB 17,6GB linux-swap(v1) swap swap
8 512GB 512GB 210MB fat32 Basic data partition hidden, diag
uname -a
$ uname -a
Linux bkb 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ sudo dpkg --list|grep linux-image
$ sudo dpkg --list|grep linux-image
hi linux-image-5.15.0-43-generic 5.15.0-43.46 amd64 Signed kernel image generic
ii linux-image-6.2.0-37-generic 6.2.0-37.38~22.04.1 amd64 Signed kernel image generic
ii linux-image-generic-hwe-22.04 6.2.0.37.38~22.04.15 amd64 Generic Linux kernel image
$ sudo smartctl -a /dev/nvme0n1p6
$ sudo smartctl -a /dev/nvme0n1p6
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-37-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: INTEL SSDPEKNW512G8
Serial Number: BTNH05020T67512A
Firmware Version: 004C
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Dec 4 22:16:07 2023 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 77 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.00W - - 2 2 2 2 0 0
3 - 0.0250W - - 3 3 3 3 5000 5000
4 - 0.0040W - - 4 4 4 4 5000 9000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 32 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 4%
Data Units Read: 33.133.223 [16,9 TB]
Data Units Written: 29.155.023 [14,9 TB]
Host Read Commands: 564.528.828
Host Write Commands: 395.005.395
Controller Busy Time: 9.251
Power Cycles: 1.459
Power On Hours: 8.288
Unsafe Shutdowns: 278
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
sudo nvme 错误日志 /dev/nvme0n1p6
sudo nvme error-log /dev/nvme0n1p6
Error Log Entries for device:nvme0n1p6 entries:64
.................
Entry[ 0]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 1]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 2]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
...
...
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
phase_tag : 0
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
sudo fsck -f /dev/nvme0n1p6
sudo fsck -f /dev/nvme0n1p6
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p6: 2052426/21331968 files (0.9% non-contiguous), 56383172/85299200 blocks
echo $?
0
e2fsck 检查坏块(非破坏性读写测试)无错误
sudo e2fsck -fccky /dev/nvme0n1p6
$ fwupdmgr 更新
$ fwupdmgr update
Devices with no available firmware updates:
• ELAN1300:00 04F3:3104
• INTEL SSDPEKNW512G8
• System Firmware
• UEFI Device Firmware
• UEFI dbx
No updatable devices
$ fwupdmgr 获取设备
$ fwupdmgr get-devices
VivoBook_ASUSLaptop X521EA_K533EA
│
├─ELAN1300:00 04F3:3104:
│ Device ID: 3ab6179e75c876a50f6dcb40ae0a83ac471fb394
│ Summary: Touchpad
│ Current version: 0x0001
│ Bootloader Version: 0x0000
│ Vendor: HIDRAW:0x04F3
│ GUIDs: dcbfd629-c2d8-53b4-bb8b-306fd916f0e0
│ 9573bac6-3cee-5094-90cd-4d7dc8122a8e
│ bd873b66-c478-5130-9968-00dc0d89d15d
│ 3852e430-731f-55fa-a0e1-f2ff3b818c9f
│ 646d07fa-2f99-5404-870e-e834a3386353
│ Device Flags: • Internal device
│ • Updatable
│
├─INTEL SSDPEKNW512G8:
│ Device ID: c430a03ca2a65dfe2412ff950c79c51f6aec1317
│ Summary: NVM Express solid state drive
│ Current version: 004C
│ Vendor: Intel Corporation (NVME:0x8086)
│ GUIDs: c5fe8b70-dc9a-5c3b-9634-659091d29812
│ 1122104f-b10a-5f32-bc13-7a1ac0f52ea2
│ c6cd9ab0-8f20-512e-9e1f-1af55b8454b9
│ 82741c78-f5dc-5c23-a152-00de5799edc8
│ 2b8c6418-6719-51b3-a700-f6061c86874b
│ Device Flags: • Updatable
│ • System requires external power source
│ • Needs a reboot after installation
│
├─System Firmware:
│ │ Device ID: a45df35ac0e948ee180fe216a5f703f32dda163f
│ │ Summary: UEFI ESRT device
│ │ Current version: 787
│ │ Minimum Version: 787
│ │ Vendor: ASUSTeK COMPUTER INC. (DMI:American Megatrends International, LLC.)
│ │ Update State: Success
│ │ GUIDs: 60c270d7-c1c7-55d6-a556-f8ed502657b8
│ │ 230c8b18-8d9b-53ec-838b-6cfc0383493a
│ │ Device Flags: • Internal device
│ │ • Updatable
│ │ • System requires external power source
│ │ • Needs a reboot after installation
│ │ • Cryptographic hash verification is available
│ │ • Device is usable for the duration of the update
│ │ • Full disk encryption secrets may be invalidated when updating
│ │
│ └─UEFI dbx:
│ Device ID: 362301da643102b9f38477387e2193e57abaa590
│ Summary: UEFI revocation database
│ Current version: 272
│ Minimum Version: 272
│ Vendor: UEFI:Linux Foundation
│ Install Duration: 1 second
│ GUIDs: 6c9777b8-19f2-5e2c-9210-66ef3691a9f3
│ c8749f7f-439b-5c3c-a2ea-3baacf663a5a
│ c6682ade-b5ec-57c4-b687-676351208742
│ f8ba2887-9411-5c36-9cee-88995bb39731
│ 7d5759e5-9aa0-5f0c-abd6-7439bb11b9f6
│ 0c7691e1-b6f2-5d71-bc9c-aabee364c916
│ Device Flags: • Internal device
│ • Updatable
│ • Needs a reboot after installation
│ • Only version upgrades are allowed
│ • Signed Payload
│
└─UEFI Device Firmware:
Device ID: 349bb341230b1a86e5effe7dfe4337e1590227bd
Summary: UEFI ESRT device
Current version: 1
Vendor: DMI:American Megatrends International, LLC.
Update State: Success
GUID: 9bb97156-241b-34a5-90be-06f0048895e5
Device Flags: • Internal device
• Updatable
• System requires external power source
• Needs a reboot after installation
• Device is usable for the duration of the update
答案1
我设法解决了这个问题,从那以后机器就没再停机过。我修改了 grub 以将默认电源状态最大延迟设置为 0,然后关闭 APST。
如果 SSD 崩溃,则没有日志条目,因为没有内容可写入。我在关机前设法拍下了屏幕上从内存写入的内容。这是一张关于崩溃期间日志的照片。
根据此图像,我修改了 GRUB 设置
因此,这是解决方案:
# 1. 备份
sudo cp /etc/default/grub /etc/default/grub.$(date +%Y-%m-%d)
# 2. grub 编辑
sudo gedit /etc/default/grub
#3. 这个要被替换
GRUB_CMDLINE_LINUX_DEFAULT="安静的启动"
对此
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0 pcie_aspm=off*"
# 节省
# 更新 grub
sudo update-grub
# 重启
sudo reboot
# 验证
sudo nvme get-feature /dev/nvme0 -f 0x0c -H
get-feature:0x0c (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
#
# default_ps_max_latency_us = default power state max latency (microseconds)
# Users can set ps_max_latency_us to zero to turn off APST
# So when set to 0, the SSD won't enter power management states autonomously, which means it should remain operational and not enter any power-saving modes
希望它能帮助遇到同样问题的人。