我上个月买了一个 M.2,用了一两个星期都没出什么问题,但是从上周开始,有几次文件系统在正常使用过程中突然变为只读模式。我在 Google 上搜索了遇到的错误,但我找到的所有答案都伴随有其他错误。
我目前正在运行 Pop!_OS,内核为 5.11.18-xanmod1。这是dmesg
输出,显示了一些严重的中等错误,除此之外没有其他特殊情况。
[ 0.939435] nvme nvme0: pci function 0000:04:00.0
[ 1.137413] nvme nvme0: allocated 64 MiB host memory buffer.
[ 1.166580] nvme nvme0: 7/0/0 default/read/poll queues
[ 1.174790] nvme0n1: p1 p2 p3 p6
[ 3.110650] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 3.551572] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[ 6.554921] blk_update_request: critical medium error, dev nvme0n1, sector 1724543752 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[ 9.780856] blk_update_request: critical medium error, dev nvme0n1, sector 94529656 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 9.947756] blk_update_request: critical medium error, dev nvme0n1, sector 94529784 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 10.114726] blk_update_request: critical medium error, dev nvme0n1, sector 94529784 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
sudo nvme smart-log /dev/nvme0
显示大约一百个媒体错误,没有严重警告:
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 49 C
available_spare : 99%
available_spare_threshold : 32%
percentage_used : 0%
endurance group critical warning summary: 0
data_units_read : 1,025,148
data_units_written : 2,846,247
host_read_commands : 11,115,356
host_write_commands : 20,238,122
controller_busy_time : 0
power_cycles : 98
power_on_hours : 232
unsafe_shutdowns : 31
media_errors : 115
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 18
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
还有smartctl -x /dev/nvm0
:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.18-xanmod1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: XPG GAMMIX S5
Serial Number: 2K502L25DCAF
Firmware Version: V9002s73
PCI Vendor/Subsystem ID: 0x10ec
IEEE OUI Identifier: 0x00e04c
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue May 11 08:45:43 2021 -03
Firmware Updates (0x02): 1 Slot
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 115 Celsius
Critical Comp. Temp. Threshold: 120 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 0 50000
1 + 4.00W - - 1 1 1 1 0 50000
2 + 3.00W - - 2 2 2 2 0 50000
3 - 0.0500W - - 3 3 3 3 4000 50000
4 - 0.0080W - - 4 4 4 4 8000 100000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 50 Celsius
Available Spare: 99%
Available Spare Threshold: 32%
Percentage Used: 0%
Data Units Read: 1,025,153 [524 GB]
Data Units Written: 2,846,294 [1.45 TB]
Host Read Commands: 11,115,492
Host Write Commands: 20,239,488
Controller Busy Time: 0
Power Cycles: 98
Power On Hours: 232
Unsafe Shutdowns: 31
Media and Data Integrity Errors: 115
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Thermal Temp. 1 Transition Count: 18
Error Information (NVMe Log 0x01, max 8 entries)
No Errors Logged
smartctl -x
使用或均不会出现错误日志nvme error-log
:
Error Log Entries for device:nvme0 entries:8
.................
Entry[ 0]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 1]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 2]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 3]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 4]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 5]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 6]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
Entry[ 7]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS: The command completed successfully)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
trtype : The transport type is not indicated or the error is not transport related.
cs : 0
trtype_spec_info: 0
.................
我看到其他问题的答案都说需要更换驱动器,但大多数答案都指出需要更换驱动器的原因是存在 I/O 错误,这些错误记录在dmesg
那些 OP 上。我仍然在保修期内 --- 商店保修 3 个月,制造商保修 5 年,但如果不是硬件故障,我不希望更换驱动器。
有人能告诉我这是否确实与硬件有关,或者为我指出进一步调试的正确方向吗?