“严重介质错误”和系统(GNU/Linux)突然进入只读模式是硬件故障的症状吗?

“严重介质错误”和系统(GNU/Linux)突然进入只读模式是硬件故障的症状吗?

我上个月买了一个 M.2,用了一两个星期都没出什么问题,但是从上周开始,有几次文件系统在正常使用过程中突然变为只读模式。我在 Google 上搜索了遇到的错误,但我找到的所有答案都伴随有其他错误。

我目前正在运行 Pop!_OS,内核为 5.11.18-xanmod1。这是dmesg输出,显示了一些严重的中等错误,除此之外没有其他特殊情况。

[    0.939435] nvme nvme0: pci function 0000:04:00.0
[    1.137413] nvme nvme0: allocated 64 MiB host memory buffer.
[    1.166580] nvme nvme0: 7/0/0 default/read/poll queues
[    1.174790]  nvme0n1: p1 p2 p3 p6
[    3.110650] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[    3.551572] EXT4-fs (nvme0n1p3): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[    6.554921] blk_update_request: critical medium error, dev nvme0n1, sector 1724543752 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[    9.780856] blk_update_request: critical medium error, dev nvme0n1, sector 94529656 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[    9.947756] blk_update_request: critical medium error, dev nvme0n1, sector 94529784 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   10.114726] blk_update_request: critical medium error, dev nvme0n1, sector 94529784 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

sudo nvme smart-log /dev/nvme0显示大约一百个媒体错误,没有严重警告:

Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning            : 0
temperature             : 49 C
available_spare             : 99%
available_spare_threshold       : 32%
percentage_used             : 0%
endurance group critical warning summary: 0
data_units_read             : 1,025,148
data_units_written          : 2,846,247
host_read_commands          : 11,115,356
host_write_commands         : 20,238,122
controller_busy_time            : 0
power_cycles                : 98
power_on_hours              : 232
unsafe_shutdowns            : 31
media_errors                : 115
num_err_log_entries         : 0
Warning Temperature Time        : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count   : 18
Thermal Management T2 Trans Count   : 0
Thermal Management T1 Total Time    : 0
Thermal Management T2 Total Time    : 0

还有smartctl -x /dev/nvm0

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.18-xanmod1] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       XPG GAMMIX S5
Serial Number:                      2K502L25DCAF
Firmware Version:                   V9002s73
PCI Vendor/Subsystem ID:            0x10ec
IEEE OUI Identifier:                0x00e04c
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue May 11 08:45:43 2021 -03
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     115 Celsius
Critical Comp. Temp. Threshold:     120 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.00W       -        -    0  0  0  0        0   50000
 1 +     4.00W       -        -    1  1  1  1        0   50000
 2 +     3.00W       -        -    2  2  2  2        0   50000
 3 -   0.0500W       -        -    3  3  3  3     4000   50000
 4 -   0.0080W       -        -    4  4  4  4     8000  100000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        50 Celsius
Available Spare:                    99%
Available Spare Threshold:          32%
Percentage Used:                    0%
Data Units Read:                    1,025,153 [524 GB]
Data Units Written:                 2,846,294 [1.45 TB]
Host Read Commands:                 11,115,492
Host Write Commands:                20,239,488
Controller Busy Time:               0
Power Cycles:                       98
Power On Hours:                     232
Unsafe Shutdowns:                   31
Media and Data Integrity Errors:    115
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   18

Error Information (NVMe Log 0x01, max 8 entries)
No Errors Logged

smartctl -x使用或均不会出现错误日志nvme error-log

Error Log Entries for device:nvme0 entries:8
.................
 Entry[ 0]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 1]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 2]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 3]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 4]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 5]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 6]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................
 Entry[ 7]   
.................
error_count : 0
sqid        : 0
cmdid       : 0
status_field    : 0(SUCCESS: The command completed successfully)
parm_err_loc    : 0
lba     : 0
nsid        : 0
vs      : 0
trtype      : The transport type is not indicated or the error is not transport related.
cs      : 0
trtype_spec_info: 0
.................

我看到其他问题的答案都说需要更换驱动器,但大多数答案都指出需要更换驱动器的原因是存在 I/O 错误,这些错误记录在dmesg那些 OP 上。我仍然在保修期内 --- 商店保修 3 个月,制造商保修 5 年,但如果不是硬件故障,我不希望更换驱动器。

有人能告诉我这是否确实与硬件有关,或者为我指出进一步调试的正确方向吗?

相关内容