2021 年 5 月,我为我的 HP Elitebook 840 G3 笔记本电脑购买了一块 2.5 英寸 SATA SSD。该 SSD 是“三星 870 EVO 500GB SATA 2.5 英寸内置固态硬盘 (SSD)”,部件代码为“MZ-77E500B/EU”
这台笔记本电脑最初在 M2 SSD 上装载了 Windows(我买的时候装的,但不是原装的),我在三星 SATA SSD 上设置了 Gentoo Linux 双启动。大约在 2023 年 1 月,M2 SSD 出现故障 - 上面似乎没有可读的扇区 - 所以从那时起这台机器就只能运行 Linux。
7 月份,我发现 /home/ 分区中的某些文件无法读取 - 这些文件在一年前创建时是可以正常读取的。尝试读取以下这些文件时出现内核错误。
内核错误消息
[Jul23 21:53] ata1.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[ +0.000002] ata1.00: irq_stat 0x40000008
[ +0.000002] ata1.00: failed command: READ FPDMA QUEUED
[ +0.000003] ata1.00: cmd 60/08:20:b0:d8:43/00:00:1b:00:00/40 tag 4 ncq dma 4096 in
res 41/40:08:b0:d8:43/00:00:1b:00:00/00 Emask 0x409 (media error) <F>
[ +0.000001] ata1.00: status: { DRDY ERR }
[ +0.000001] ata1.00: error: { UNC }
[ +0.000795] ata1.00: supports DRM functions and may not be fully accessible
[ +0.002737] ata1.00: supports DRM functions and may not be fully accessible
[ +0.002391] ata1.00: configured for UDMA/133
[ +0.000043] scsi_io_completion_action: 3 callbacks suppressed
[ +0.000021] sd 0:0:0:0: [sda] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ +0.000015] sd 0:0:0:0: [sda] tag#4 Sense Key : Medium Error [current]
[ +0.000011] sd 0:0:0:0: [sda] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[ +0.000013] sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 1b 43 d8 b0 00 00 08 00
[ +0.000006] print_req_error: 3 callbacks suppressed
[ +0.000011] blk_update_request: I/O error, dev sda, sector 457431216 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ +0.000061] ata1: EH complete
[ +0.000064] ata1.00: Enabling discard_zeroes_data
[ +0.203593] ata1.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[ +0.000023] ata1.00: irq_stat 0x40000008
[ +0.000002] ata1.00: failed command: READ FPDMA QUEUED
[ +0.000003] ata1.00: cmd 60/08:28:b0:d8:43/00:00:1b:00:00/40 tag 5 ncq dma 4096 in
res 41/40:08:b0:d8:43/00:00:1b:00:00/00 Emask 0x409 (media error) <F>
[ +0.000001] ata1.00: status: { DRDY ERR }
[ +0.000001] ata1.00: error: { UNC }
[ +0.000814] ata1.00: supports DRM functions and may not be fully accessible
[ +0.003289] ata1.00: supports DRM functions and may not be fully accessible
[ +0.002271] ata1.00: configured for UDMA/133
[ +0.000059] sd 0:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[ +0.000015] sd 0:0:0:0: [sda] tag#5 Sense Key : Medium Error [current]
[ +0.000011] sd 0:0:0:0: [sda] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
[ +0.000012] sd 0:0:0:0: [sda] tag#5 CDB: Read(10) 28 00 1b 43 d8 b0 00 00 08 00
[ +0.000014] blk_update_request: I/O error, dev sda, sector 457431216 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ +0.000018] Buffer I/O error on dev dm-3, logical block 8350230, async page read
从可启动 USB 驱动器检查 /home 文件系统(因此没有安装任何文件)时,情况并不乐观:
livecd # fsck ext4 -fck /dev/vg0/home
e2fsck 1.46.2 (28-Feb-2021)
Checking for bad blocks (read-only test):
99.88% done, 7:37 elapsed. (75/0/0 errors)
done
home: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 673706: 8350230 8350237--8350238 8350246 8350254 8350289 8350292 8350297 8350300 8350305 8350308 8350313 8350316 8350422 8350430 8350438 8350446 8350481 8350489 8350497 8350505 8350614 8350622 8350630 8350638 8350673 8350676 8350681 8350684 8350689 8350697 8350806 8350814 8350865 8350873 8350881 8350889 8350998 8351006 8351014 8351022 8351057 8351065 8351068 8351073 8351076 8351081 8351190 8351198 8351249 8351257 8351273 8351382 8351398 8351441 8351449 8351457 8351465 8351574 8351582 8351633 8351641 8351657
Multiply-claimed block(s) in inode 1188624: 4828842--4828843
Multiply-claimed block(s) in inode 3015126: 16730711 16730719 16730727 16730903 16730919 16730927 16731095 16731103 16731303 16731311
Multiply-claimed block(s) in inode 3015662: 13523212 13523220 13523228 13523236 13523412 13523604
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 4 inodes containing multiply-claimed blocks.)
File /ra/Documents/folkus/folkus/durham-jail2.aiff (inode #673706, mod time Mon May 6 15:42:48 2019)
has 63 multiply-claimed block(s), shared with 1 file(s):
<The bad blocks inode> (inode #1, mod time Mon Jul 24 19:35:22 2023)
驱动器的 SMART 数据显示我有 13 个重新分配的扇区(这个数字在过去一个月中似乎保持不变,并没有增加)。
完整的 smartctl 输出
$ sudo smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.38-gentooamd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 870 EVO 500GB
Serial Number: S62BNZ0R429272T
LU WWN Device Id: 5 002538 fc1409fcd
Firmware Version: SVT01B6Q
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: In smartctl database 7.3/5528
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Aug 27 17:37:06 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 85) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 097 097 010 Pre-fail Always - 13
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1871
12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 1193
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 6
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 097 097 010 Pre-fail Always - 13
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 097 097 010 Pre-fail Always - 13
187 Uncorrectable_Error_Cnt 0x0032 099 099 000 Old_age Always - 682
190 Airflow_Temperature_Cel 0x0032 072 054 000 Old_age Always - 28
195 ECC_Error_Rate 0x001a 199 199 000 Old_age Always - 682
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 50
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 6080656104
SMART Error Log Version: 1
ATA Error Count: 682 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 682 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 70 78 ed 42 40 Error: UNC at LBA = 0x0042ed78 = 4386168
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 70 78 ed 42 40 0e 03:52:56.807 READ FPDMA QUEUED
60 08 68 70 ed 42 40 0d 03:52:56.807 READ FPDMA QUEUED
60 08 60 68 ed 42 40 0c 03:52:56.807 READ FPDMA QUEUED
60 08 58 60 ed 42 40 0b 03:52:56.807 READ FPDMA QUEUED
60 08 50 58 ed 42 40 0a 03:52:56.807 READ FPDMA QUEUED
Error 681 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 38 38 ed 42 40 Error: UNC at LBA = 0x0042ed38 = 4386104
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 38 38 ed 42 40 07 03:52:56.602 READ FPDMA QUEUED
60 08 30 30 ed 42 40 06 03:52:56.602 READ FPDMA QUEUED
60 08 28 28 ed 42 40 05 03:52:56.602 READ FPDMA QUEUED
60 08 20 20 ed 42 40 04 03:52:56.602 READ FPDMA QUEUED
60 08 18 18 ed 42 40 03 03:52:56.602 READ FPDMA QUEUED
Error 680 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 98 00 ec 42 40 Error: UNC at LBA = 0x0042ec00 = 4385792
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 98 00 ec 42 40 13 03:52:56.276 READ FPDMA QUEUED
60 00 90 00 ea 42 40 12 03:52:56.276 READ FPDMA QUEUED
60 00 88 00 e8 42 40 11 03:52:56.276 READ FPDMA QUEUED
60 08 80 f8 e7 42 40 10 03:52:56.276 READ FPDMA QUEUED
60 08 78 f0 e7 42 40 0f 03:52:56.276 READ FPDMA QUEUED
Error 679 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 48 f8 e6 42 40 Error: UNC at LBA = 0x0042e6f8 = 4384504
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 48 f8 e6 42 40 09 03:52:55.872 READ FPDMA QUEUED
60 08 40 f0 e6 42 40 08 03:52:55.872 READ FPDMA QUEUED
60 08 38 e8 e6 42 40 07 03:52:55.872 READ FPDMA QUEUED
60 08 30 e0 e6 42 40 06 03:52:55.872 READ FPDMA QUEUED
60 08 20 d8 e6 42 40 04 03:52:55.872 READ FPDMA QUEUED
Error 678 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 b8 e6 42 40 Error: UNC at LBA = 0x0042e6b8 = 4384440
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 08 08 b8 e6 42 40 01 03:52:55.603 READ FPDMA QUEUED
60 08 00 b0 e6 42 40 00 03:52:55.603 READ FPDMA QUEUED
60 08 f0 a8 e6 42 40 1e 03:52:55.603 READ FPDMA QUEUED
60 08 e8 a0 e6 42 40 1d 03:52:55.603 READ FPDMA QUEUED
60 08 e0 98 e6 42 40 1c 03:52:55.603 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1845 -
# 2 Offline Completed without error 00% 1343 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
256 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
目前,我已允许fsck
删除有问题的文件,重新检查文件系统,一切似乎都恢复正常。但是,我无法对 NTFS 分区或 Linux LVM 物理卷中的未分配空间进行这种级别的检查。
根据 SMART,我只向驱动器写入了大约 3TB,因此应该不会遇到磨损问题。
问题
我担心将来会再次出错并导致进一步的数据丢失(这次文件要么是无关紧要的文件(缓存),要么可以从其他地方的备份中恢复)。我可以在保修期内退回驱动器进行更换,还是应该购买新的?
答案1
我会将它连接到 Windows 机器并安装三星的 Magician 软件。在那里运行诊断程序 - 甚至可能需要应用固件更新。那是获取和查看实际可操作数据的最佳位置。从那里,您可以选择联系三星并查看他们说了什么,因为他们很可能无论如何都会让您这样做。