从 smartctl 的调查结果来看,我的 SSD 快要死了吗?

从 smartctl 的调查结果来看,我的 SSD 快要死了吗?

从昨天开始,我的 SSD 有时会在 BIOS 中丢失。但今天早上,它完全从启动菜单中消失了。作为故障排除过程的一部分,我更换了 SATA 电缆并重新连接了电源线,在此过程中,有一次它在启动时被识别出来。我顺利进入了操作系统(Mint 18)。

在我备份了我重视的数据后,我进行了一个smartctl简短的测试,发现了以下结果。有熟悉此事的人可以确认一下这个有问题的SSD是否真的处于放弃的边缘或者有希望修复提到的错误吗?

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.15.0-45-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron RealSSD m4/C400/P400
Device Model:     M4-CT064M4SSD2
Serial Number:    0000000011270313DEA7
LU WWN Device Id: 5 00a075 10313dea7
Firmware Version: 070H
User Capacity:    64,023,257,088 bytes [64.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar  4 13:22:22 2019 +06
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        (  295) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (   4) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   084   084   050    Pre-fail  Always       -       17
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       18432
  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       9383
 12 Power_Cycle_Count       0x0032   100   100   001    Old_age   Always       -       5270
170 Grown_Failing_Block_Ct  0x0033   100   100   010    Pre-fail  Always       -       9
171 Program_Fail_Count      0x0032   100   100   001    Old_age   Always       -       483
172 Erase_Fail_Count        0x0032   100   100   001    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0033   095   095   010    Pre-fail  Always       -       174
174 Unexpect_Power_Loss_Ct  0x0032   100   100   001    Old_age   Always       -       164
181 Non4k_Aligned_Access    0x0022   100   100   001    Old_age   Always       -       1555 506 1049
183 SATA_Iface_Downshift    0x0032   100   100   001    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   001    Old_age   Always       -       25
188 Command_Timeout         0x0032   100   100   001    Old_age   Always       -       0
189 Factory_Bad_Block_Ct    0x000e   100   100   001    Old_age   Always       -       49
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x003a   100   100   001    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   001    Old_age   Always       -       9
197 Current_Pending_Sector  0x0032   100   100   001    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   001    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   001    Old_age   Always       -       1019
202 Perc_Rated_Life_Used    0x0018   095   095   001    Old_age   Offline      -       5
206 Write_Error_Rate        0x000e   100   100   001    Old_age   Always       -       483

SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 3

ATA Error Count: 0
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 0 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 50 08 d0 34 2c 40   at LBA = 0x002c34d0 = 2897104

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 d0 34 2c 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  42d+22:22:28.928  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  42d+22:22:28.928  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  42d+22:22:28.928  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 00  42d+22:22:28.928  SET FEATURES [Set transfer mode]

Error -1 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 50 08 d0 34 2c 40   at LBA = 0x002c34d0 = 2897104

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 d0 34 2c 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  60 00 08 c8 34 2c 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00  42d+22:22:28.928  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00  42d+22:22:28.928  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  42d+22:22:28.928  IDENTIFY DEVICE

Error -2 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 50 30 d0 34 2c 40   at LBA = 0x002c34d0 = 2897104

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 30 c8 34 2c 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  60 00 90 20 49 2d 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  60 03 88 48 19 2d 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  60 00 90 b8 38 2d 40 00  42d+22:22:28.928  READ FPDMA QUEUED
  60 00 90 e8 30 2d 40 00  42d+22:22:28.928  READ FPDMA QUEUED

Error -3 occurred at disk power-on lifetime: 9381 hours (390 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 50 80 f8 f8 b4 40   at LBA = 0x00b4f8f8 = 11860216

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 80 40 0b 7c 40 00  42d+21:42:28.928  WRITE FPDMA QUEUED
  61 00 40 80 09 7c 40 00  42d+21:42:28.928  WRITE FPDMA QUEUED
  61 00 40 00 09 7c 40 00  42d+21:42:28.928  WRITE FPDMA QUEUED
  61 00 88 00 f8 b4 40 00  42d+21:42:28.928  WRITE FPDMA QUEUED
  61 00 40 00 14 58 40 00  42d+21:42:28.928  WRITE FPDMA QUEUED

Error -4 occurred at disk power-on lifetime: 9380 hours (390 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 50 10 40 0b 7c 40   at LBA = 0x007c0b40 = 8129344

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 10 00 2e 17 40 00  42d+21:32:28.928  WRITE FPDMA QUEUED
  61 00 10 88 2d 17 40 00  42d+21:32:28.928  WRITE FPDMA QUEUED
  61 00 08 58 2d 17 40 00  42d+21:32:28.928  WRITE FPDMA QUEUED
  61 00 10 40 2d 17 40 00  42d+21:32:28.928  WRITE FPDMA QUEUED
  61 00 18 20 2d 17 40 00  42d+21:32:28.928  WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      9383         2719472
# 2  Vendor (0xff)       Completed without error       00%      9289         -
# 3  Vendor (0xff)       Completed without error       00%      7651         -
# 4  Vendor (0xff)       Completed without error       00%      6793         -
# 5  Vendor (0xff)       Completed without error       00%      6785         -
# 6  Vendor (0xff)       Completed without error       00%      6570         -
# 7  Vendor (0xff)       Completed without error       00%      6171         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

附言。也许还值得一提的是,当症状第一次出现时,我陷入了 initramfs 命令提示符处。然后我fschk在SSD上运行了一次,当时问题似乎已经解决了。

答案1

要回答标题问题,不,SMART 结果本身并不值得担心。尽管您的驱动器确实有一些不可读的扇区,但它会在下次写入时从内部保留中重新分配它们。现在,告诉您到目前为止Reallocated_Event_Count只有 9 个闪存块(对应于 9 * 2048 = 18432 个扇区,如图所示)已被保留替换。Reallocated_Sector_Ct

如果您不想等到当前无法读取的扇区被正常系统操作重写,您可以dd使用或等工具手动写入它们hdparm,但这肯定不适合胆小的人(如果您在设置写入位置时搞砸了,您将丢失一些完全有效的数据)。

但是,您提到的其他症状(例如驱动器在通电时无法识别)可能确实表明电子设备即将耗尽。大多数情况下,这些问题只是由于 PSU 或布线问题造成的,因此请尝试将驱动器插入不同的计算机或更换 PSU。

SMART 测试通常不会告诉您电子设备有任何问题,它们主要测试实际的存储介质,而不是控制器。

答案2

消息

Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed.

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      9383         2719472

表明您的驱动器出现故障,并且驱动器本身无法正确读取逻辑块地址 2719472。
您还将在以下位置发现内核消息/var/日志/消息日志类似于:

如果您想接近 100% 确定,请将驱动器连接到不同的主机并重复智能测试。我遇到过由于主板老化而导致 BIOS 无法识别驱动器的情况,但在另一个系统中却运行良好。

相关内容