SSD 的破坏性分区

SSD 的破坏性分区

我有一块外置 SSD,本周早些时候出现了一些文件损坏。型号是

Model Family:     Crucial/Micron RealSSD m4/C400/P400
Device Model:     M4-CT256M4SSD2

显然,时钟上显示的通电时间已达 20,000 小时。

即使状态是:

SMART overall-health self-assessment test result: PASSED

自我测试失败:

在此处输入图片描述

gsmartcontrol 报告的属性如下:

在此处输入图片描述

完整输出为:

smartctl 7.2 2020-12-30 r5155 [x86_64-w64-mingw32-w10-b19045] (sf-7.2-1)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron RealSSD m4/C400/P400
Device Model:     M4-CT256M4SSD2
Serial Number:    0000000012050904896A
LU WWN Device Id: 5 00a075 10904896a
Firmware Version: 0309
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Apr 05 11:36:29 2023 PM
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 117) The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline 
data collection:        ( 1190) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  19) minutes.
Conveyance self-test routine
recommended polling time:    (   3) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   050    -    0
  5 Reallocated_Sector_Ct   PO--CK   099   099   010    -    36864 (0 5)
  9 Power_On_Hours          -O--CK   100   100   001    -    19434
 12 Power_Cycle_Count       -O--CK   100   100   001    -    626
170 Grown_Failing_Block_Ct  PO--CK   099   099   010    -    89
171 Program_Fail_Count      -O--CK   100   100   001    -    20
172 Erase_Fail_Count        -O--CK   100   100   001    -    64
173 Wear_Leveling_Count     PO--CK   083   083   010    -    524
174 Unexpect_Power_Loss_Ct  -O--CK   100   100   001    -    5
181 Non4k_Aligned_Access    -O---K   100   100   001    -    9248 4153 5094
183 SATA_Iface_Downshift    -O--CK   100   100   001    -    0
184 End-to-End_Error        PO--CK   100   100   050    -    0
187 Reported_Uncorrect      -O--CK   100   100   001    -    582
188 Command_Timeout         -O--CK   100   100   001    -    0
189 Factory_Bad_Block_Ct    -OSR--   100   100   001    -    85
194 Temperature_Celsius     -O---K   100   100   000    -    0
195 Hardware_ECC_Recovered  -O-RCK   100   100   001    -    353
196 Reallocated_Event_Count -O--CK   100   100   001    -    89
197 Current_Pending_Sector  -O--CK   100   100   001    -    0
198 Offline_Uncorrectable   ----CK   100   100   001    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   001    -    3
202 Perc_Rated_Life_Used    ---RC-   083   083   001    -    17
206 Write_Error_Rate        -OSR--   100   100   001    -    20
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O     51  Comprehensive SMART error log
0x03       GPL     R/O  16383  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O   3449  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0       GPL     VS    2000  Device vendor specific log
0xa0       SL      VS     208  Device vendor specific log
0xa1-0xbf  GPL,SL  VS       1  Device vendor specific log
0xc0       GPL     VS      80  Device vendor specific log
0xc1-0xdf  GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (16383 sectors)
No Errors Logged

SMART Extended Self-test Log size 3449 not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       50%     19433         244627776
# 2  Short offline       Completed: read failure       60%     19433         492159152
# 3  Short offline       Completed: read failure       60%     16715         492159152
# 4  Vendor (0xff)       Completed without error       00%     16602         -
# 5  Vendor (0xff)       Completed without error       00%      5107         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       1 (0x0001)
Device State:                        Active (0)
Current Temperature:                     0 Celsius
Power Cycle Min/Max Temperature:     --/ 0 Celsius
Lifetime    Min/Max Temperature:     --/ 0 Celsius

SCT Temperature History Version:     2
Temperature Sampling Period:         10 minutes
Temperature Logging Interval:        10 minutes
Min/Max recommended Temperature:      0/70 Celsius
Min/Max Temperature Limit:           -5/75 Celsius
Temperature History Size (Index):    478 (151)

Index    Estimated Time   Temperature Celsius
 152    2023-04-02 04:00     ?  -
 ...    ..(473 skipped).    ..  -
 148    2023-04-05 11:00     ?  -
 149    2023-04-05 11:10     0  -
 150    2023-04-05 11:20     0  -
 151    2023-04-05 11:30     0  -

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4             626  ---  Lifetime Power-On Resets
0x01  0x010  4           19434  ---  Power-on Hours
0x01  0x018  6     66167492621  ---  Logical Sectors Written
0x01  0x020  6      1499672681  ---  Number of Write Commands
0x01  0x028  6    138123876618  ---  Logical Sectors Read
0x01  0x030  6      2013843720  ---  Number of Read Commands
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4             582  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1               0  ---  Current Temperature
0x05  0x010  1               0  ---  Average Short Term Temperature
0x05  0x018  1               0  ---  Average Long Term Temperature
0x05  0x020  1               0  ---  Highest Temperature
0x05  0x028  1               0  ---  Lowest Temperature
0x05  0x030  1               0  ---  Highest Average Short Term Temperature
0x05  0x038  1               0  ---  Lowest Average Short Term Temperature
0x05  0x040  1               0  ---  Highest Average Long Term Temperature
0x05  0x048  1               0  ---  Lowest Average Long Term Temperature
0x05  0x050  4               -  ---  Time in Over-Temperature
0x05  0x058  1              70  ---  Specified Maximum Operating Temperature
0x05  0x060  4               -  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4           13903  ---  Number of Hardware Resets
0x06  0x010  4               0  ---  Number of ASR Events
0x06  0x018  4               3  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               4  N--  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x000a  4            0  Device-to-host register FISes sent due to a COMRESET

Crucial 自己的 SMART 报告:

在此处输入图片描述 在此处输入图片描述

我不太清楚如何解释 gsmartctl 的输出,但我不确定 SMART PASSED 结果是否正确。是时候丢弃并更换这个驱动器了吗?

答案1

忘记查看交通灯类型的自检吧。您已经获得了大量信息(SMART 数字),只需进行评估即可。制造商无论如何都不会对显示负面检查结果感兴趣。也许应该用两个巨大的灯(红色和绿色)代替驾驶舱中的所有空客仪表,分别代表“通过”和“失败”?:)

与其他人所说的相反,忽略标准化值,因为标准化过程没有明确的标准。因此,同一个 RAW 输入在标准化方面,不同制造商之间的输出会有所不同。

在 HDD 上,任何大于 0 的重新分配扇区数的原始值都表示故障 - 一些其他用户在 Superuser 上写了这篇文章,指的是 Google 或 Backblaze。与 HDD 相反,扇区或闪存块重新分配是 SSD 使用过程的一部分。对于您来说,36864 在 HDD 世界中是一个巨大的数字,但对于 SSD 来说,它可能并不重要。我宁愿看看磨损指示器 ID 202。不要期望线性增长,因为这个指标可能会考虑备用闪存块的数量和另一个微积分。

SSD 的破坏性分区

由于我缺乏使用 SSD(而不是 HDD)的 SMART 数据的经验,我无法回答您的问题,但如果您想保留此 SSD,请查看 ID 181。这个 SMART 参数表明您以错误的方式对 SSD 进行了分区,从而导致写入放大,增加了 SSD 的磨损。

https://en.wikipedia.org/wiki/Write_amplification

您在对 SSD 进行分区时很可能使用了旧版操作系统,例如 32 位 Windows XP。XP 32 位会尝试将分区起始点放在磁柱边界上,而不是兆字节 (2^20) 的倍数上。这种操作方式与将分区起始点放在内部物理扇区大小的非小数倍数上的需求相冲突。对于您而言,分区的起始 LBA 编号应能被 4096 整除。(必要条件:LBA 编号 MOD 4096=0)。但现在情况并非如此。

将 SSD 的内容复制到安全位置,删除分区表,然后使用现代操作系统对 SSD 重新分区。现代操作系统很可能会将分区起始位置设置为 1 MB 的倍数,这也符合上述条件。将内容复制回 SSD。通过这样做,您可以减少写入放大,从而减轻 SSD 未来的磨损。

您可以使用 Testdisk 写入包含当前分区方案的日志文件。我猜lsblkfdisk也有合适的参数。

答案2

我不确定 gsmartcontrol 是否真的理解并正确报告所有 SMART 属性,或者磁盘固件是否正确报告其 SMART 属性。

SMART 属性显示了一些错误和薄弱的磁盘,但并不代表灾难性状态,但自检失败并且报告了一些文件损坏。

最令人费解的是“重新分配的扇区数”,其原始计数为 36864,这是一个灾难性的数额,但其标准化值相当不错,为 99,仅略低于最佳值 100。

除非你喜欢冒险,否则我会代替你更换这张磁盘。


我看到您添加了 Crucial 自己的 SMART 报告,它比 gsmartcontrol 的要清晰得多。

以下是危险信号:

364544 Retired NAND Blocks
20     NAND Page Program Failures
64     NAND Block Erase Failures
582    ECC Correction Failures
353    Corrected ECC

这里最糟糕的数据是退役 NAND 块的数量,由 Cruclial 定义 SSD 和 SMART 数据

属性 5:退役的 NAND 块

通过不断评估 NAND 块的质量,SMART 属性 5 会跟踪淘汰的块数。除了上述磨损和数据保留问题外,SSD 固件还会出于多种原因淘汰 NAND 块。淘汰的原因之一是在垃圾收集期间删除数据或移动数据时无法擦除块。这种类型的故障对用户数据的风险较低,因为相关数据正在被删除或已成功复制到 SSD 上的新位置。

这意味着磁盘上的 364544 个块已因老化而无法使用!这太庞大了。

最终诊断:磁盘出现故障,使用寿命即将结束。您应尽快更换它。

答案3

在 SMART 输出中,“通过”或“失败”只是“SMART 属性”表的一行摘要。如果“值”或“最差”列中的任何数字均不低于“阈值”列中的相应值,则报告“通过”。

如果自检因除取消测试之外的任何原因而失败,则应更换驱动器,即使 SMART 摘要仍显示一切正常。对于硬盘驱动器,SMART 只能预测大约一半的故障(SMART 摘要预测的故障甚至更少)。我认为没有人对 SSD 上的 SMART 进行过大规模研究。

相关内容