我的 SSD 出现故障了吗?我是否应该尝试保修退货?

我的 SSD 出现故障了吗?我是否应该尝试保修退货?

2021 年 5 月,我为我的 HP Elitebook 840 G3 笔记本电脑购买了一块 2.5 英寸 SATA SSD。该 SSD 是“三星 870 EVO 500GB SATA 2.5 英寸内置固态硬盘 (SSD)”,部件代码为“MZ-77E500B/EU”

这台笔记本电脑最初在 M2 SSD 上装载了 Windows(我买的时候装的,但不是原装的),我在三星 SATA SSD 上设置了 Gentoo Linux 双启动。大约在 2023 年 1 月,M2 SSD 出现故障 - 上面似乎没有可读的扇区 - 所以从那时起这台机器就只能运行 Linux。

7 月份,我发现 /home/ 分区中的某些文件无法读取 - 这些文件在一年前创建时是可以正常读取的。尝试读取以下这些文件时出现内核错误。

内核错误消息

[Jul23 21:53] ata1.00: exception Emask 0x0 SAct 0x10 SErr 0x0 action 0x0
[  +0.000002] ata1.00: irq_stat 0x40000008
[  +0.000002] ata1.00: failed command: READ FPDMA QUEUED
[  +0.000003] ata1.00: cmd 60/08:20:b0:d8:43/00:00:1b:00:00/40 tag 4 ncq dma 4096 in
                       res 41/40:08:b0:d8:43/00:00:1b:00:00/00 Emask 0x409 (media error) <F>
[  +0.000001] ata1.00: status: { DRDY ERR }
[  +0.000001] ata1.00: error: { UNC }
[  +0.000795] ata1.00: supports DRM functions and may not be fully accessible
[  +0.002737] ata1.00: supports DRM functions and may not be fully accessible
[  +0.002391] ata1.00: configured for UDMA/133
[  +0.000043] scsi_io_completion_action: 3 callbacks suppressed
[  +0.000021] sd 0:0:0:0: [sda] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[  +0.000015] sd 0:0:0:0: [sda] tag#4 Sense Key : Medium Error [current]
[  +0.000011] sd 0:0:0:0: [sda] tag#4 Add. Sense: Unrecovered read error - auto reallocate failed
[  +0.000013] sd 0:0:0:0: [sda] tag#4 CDB: Read(10) 28 00 1b 43 d8 b0 00 00 08 00
[  +0.000006] print_req_error: 3 callbacks suppressed
[  +0.000011] blk_update_request: I/O error, dev sda, sector 457431216 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[  +0.000061] ata1: EH complete
[  +0.000064] ata1.00: Enabling discard_zeroes_data
[  +0.203593] ata1.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[  +0.000023] ata1.00: irq_stat 0x40000008
[  +0.000002] ata1.00: failed command: READ FPDMA QUEUED
[  +0.000003] ata1.00: cmd 60/08:28:b0:d8:43/00:00:1b:00:00/40 tag 5 ncq dma 4096 in
                       res 41/40:08:b0:d8:43/00:00:1b:00:00/00 Emask 0x409 (media error) <F>
[  +0.000001] ata1.00: status: { DRDY ERR }
[  +0.000001] ata1.00: error: { UNC }
[  +0.000814] ata1.00: supports DRM functions and may not be fully accessible
[  +0.003289] ata1.00: supports DRM functions and may not be fully accessible
[  +0.002271] ata1.00: configured for UDMA/133
[  +0.000059] sd 0:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[  +0.000015] sd 0:0:0:0: [sda] tag#5 Sense Key : Medium Error [current]
[  +0.000011] sd 0:0:0:0: [sda] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
[  +0.000012] sd 0:0:0:0: [sda] tag#5 CDB: Read(10) 28 00 1b 43 d8 b0 00 00 08 00
[  +0.000014] blk_update_request: I/O error, dev sda, sector 457431216 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[  +0.000018] Buffer I/O error on dev dm-3, logical block 8350230, async page read

从可启动 USB 驱动器检查 /home 文件系统(因此没有安装任何文件)时,情况并不乐观:

livecd # fsck ext4  -fck /dev/vg0/home
e2fsck 1.46.2 (28-Feb-2021) 
Checking for bad blocks (read-only test):
99.88% done, 7:37 elapsed. (75/0/0 errors)
done
home: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
                    
Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 673706: 8350230 8350237--8350238 8350246 8350254 8350289 8350292 8350297 8350300 8350305 8350308 8350313 8350316 8350422 8350430 8350438 8350446 8350481 8350489 8350497 8350505 8350614 8350622 8350630 8350638 8350673 8350676 8350681 8350684 8350689 8350697 8350806 8350814 8350865 8350873 8350881 8350889 8350998 8351006 8351014 8351022 8351057 8351065 8351068 8351073 8351076 8351081 8351190 8351198 8351249 8351257 8351273 8351382 8351398 8351441 8351449 8351457 8351465 8351574 8351582 8351633 8351641 8351657
Multiply-claimed block(s) in inode 1188624: 4828842--4828843
Multiply-claimed block(s) in inode 3015126: 16730711 16730719 16730727 16730903 16730919 16730927 16731095 16731103 16731303 16731311
Multiply-claimed block(s) in inode 3015662: 13523212 13523220 13523228 13523236 13523412 13523604
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 4 inodes containing multiply-claimed blocks.)
  
File /ra/Documents/folkus/folkus/durham-jail2.aiff (inode #673706, mod time Mon May  6 15:42:48 2019)
  has 63 multiply-claimed block(s), shared with 1 file(s):
    <The bad blocks inode> (inode #1, mod time Mon Jul 24 19:35:22 2023)

驱动器的 SMART 数据显示我有 13 个重新分配的扇区(这个数字在过去一个月中似乎保持不变,并没有增加)。

完整的 smartctl 输出

$ sudo smartctl -a /dev/sda
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.38-gentooamd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 870 EVO 500GB
Serial Number:    S62BNZ0R429272T
LU WWN Device Id: 5 002538 fc1409fcd
Firmware Version: SVT01B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Aug 27 17:37:06 2023 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (    0) seconds.
Offline data collection
capabilities:            (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  85) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   097   097   010    Pre-fail  Always       -       13
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1871
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       1193
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       6
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   097   097   010    Pre-fail  Always       -       13
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   097   097   010    Pre-fail  Always       -       13
187 Uncorrectable_Error_Cnt 0x0032   099   099   000    Old_age   Always       -       682
190 Airflow_Temperature_Cel 0x0032   072   054   000    Old_age   Always       -       28
195 ECC_Error_Rate          0x001a   199   199   000    Old_age   Always       -       682
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       50
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       6080656104

SMART Error Log Version: 1
ATA Error Count: 682 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 682 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 70 78 ed 42 40  Error: UNC at LBA = 0x0042ed78 = 4386168

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 70 78 ed 42 40 0e      03:52:56.807  READ FPDMA QUEUED
  60 08 68 70 ed 42 40 0d      03:52:56.807  READ FPDMA QUEUED
  60 08 60 68 ed 42 40 0c      03:52:56.807  READ FPDMA QUEUED
  60 08 58 60 ed 42 40 0b      03:52:56.807  READ FPDMA QUEUED
  60 08 50 58 ed 42 40 0a      03:52:56.807  READ FPDMA QUEUED

Error 681 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 38 38 ed 42 40  Error: UNC at LBA = 0x0042ed38 = 4386104

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 38 38 ed 42 40 07      03:52:56.602  READ FPDMA QUEUED
  60 08 30 30 ed 42 40 06      03:52:56.602  READ FPDMA QUEUED
  60 08 28 28 ed 42 40 05      03:52:56.602  READ FPDMA QUEUED
  60 08 20 20 ed 42 40 04      03:52:56.602  READ FPDMA QUEUED
  60 08 18 18 ed 42 40 03      03:52:56.602  READ FPDMA QUEUED

Error 680 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 98 00 ec 42 40  Error: UNC at LBA = 0x0042ec00 = 4385792

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 98 00 ec 42 40 13      03:52:56.276  READ FPDMA QUEUED
  60 00 90 00 ea 42 40 12      03:52:56.276  READ FPDMA QUEUED
  60 00 88 00 e8 42 40 11      03:52:56.276  READ FPDMA QUEUED
  60 08 80 f8 e7 42 40 10      03:52:56.276  READ FPDMA QUEUED
  60 08 78 f0 e7 42 40 0f      03:52:56.276  READ FPDMA QUEUED
Error 679 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 48 f8 e6 42 40  Error: UNC at LBA = 0x0042e6f8 = 4384504

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 48 f8 e6 42 40 09      03:52:55.872  READ FPDMA QUEUED
  60 08 40 f0 e6 42 40 08      03:52:55.872  READ FPDMA QUEUED
  60 08 38 e8 e6 42 40 07      03:52:55.872  READ FPDMA QUEUED
  60 08 30 e0 e6 42 40 06      03:52:55.872  READ FPDMA QUEUED
  60 08 20 d8 e6 42 40 04      03:52:55.872  READ FPDMA QUEUED

Error 678 occurred at disk power-on lifetime: 1852 hours (77 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 b8 e6 42 40  Error: UNC at LBA = 0x0042e6b8 = 4384440

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 08 b8 e6 42 40 01      03:52:55.603  READ FPDMA QUEUED
  60 08 00 b0 e6 42 40 00      03:52:55.603  READ FPDMA QUEUED
  60 08 f0 a8 e6 42 40 1e      03:52:55.603  READ FPDMA QUEUED
  60 08 e8 a0 e6 42 40 1d      03:52:55.603  READ FPDMA QUEUED
  60 08 e0 98 e6 42 40 1c      03:52:55.603  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1845         -
# 2  Offline             Completed without error       00%      1343         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  256        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

目前,我已允许fsck删除有问题的文件,重新检查文件系统,一切似乎都恢复正常。但是,我无法对 NTFS 分区或 Linux LVM 物理卷中的未分配空间进行这种级别的检查。

根据 SMART,我只向驱动器写入了大约 3TB,因此应该不会遇到磨损问题。

问题

我担心将来会再次出错并导致进一步的数据丢失(这次文件要么是无关紧要的文件(缓存),要么可以从其他地方的备份中恢复)。我可以在保修期内退回驱动器进行更换,还是应该购买新的?

答案1

我会将它连接到 Windows 机器并安装三星的 Magician 软件。在那里运行诊断程序 - 甚至可能需要应用固件更新。那是获取和查看实际可操作数据的最佳位置。从那里,您可以选择联系三星并查看他们说了什么,因为他们很可能无论如何都会让您这样做。

相关内容