为什么我的坏块结果与 SSD 的 SMART 结果不一致?

为什么我的坏块结果与 SSD 的 SMART 结果不一致?

我有一个已经运行多年的 SSD。它是发生故障的 RAID0 的一部分。阵列中的另一个 SSD 肯定是坏的 - 当我将它插入计算机时它根本不会显示出来。我想弄清楚我正在处理的这个 SSD 是否也是坏的。

使用 Linux 中的磁盘实用程序进行的扩展 SMART 测试告诉我“磁盘正常”。当我使用 在其上运行 badblockssudo badblocks -w -s -v /dev/sdc并让它运行一整夜时,我得到了一大堆错误。它到早上还没有完成,我不得不中断它,但它大约在这个阶段:one, 20:19:28 elapsed. (0/13/5396172 errors)

我猜我疑惑的是:如果 badblocks 遇到这么多错误,为什么 SMART 看不到它们?我的 SSD 是不是坏了?

我的 SMART 测试结果如下:

$ sudo smartctl -a /dev/sdc
[sudo] password for tal: 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.6.13-200.fc31.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 840 EVO 250GB
Serial Number:    S1DBNSAD968844W
LU WWN Device Id: 5 002538 8a0021de5
Firmware Version: EXT0DB6Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jul  3 08:15:18 2020 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        ( 4800) seconds.
Offline data collection
capabilities:            (0x53) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  80) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       53840
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       976
177 Wear_Leveling_Count     0x0013   094   094   000    Pre-fail  Always       -       65
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   062   047   000    Old_age   Always       -       38
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   001   001   000    Old_age   Always       -       3718773
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       183
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       22842562290

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         0         -
# 2  Extended offline    Completed without error       00%         0         -
# 3  Short offline       Completed without error       00%         0         -
# 4  Short offline       Completed without error       00%       212         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

答案1

以下 SMART 参数可能是导致问题的原因。

199 CRC_Error_Count 0x003e 001 001 000 Old_age 始终 - 3718773

不幸的是,我缺乏使用 SSD 的经验,无法判断这是否是一个典型值。

行为不同的原因可能是 SMART 仅在磁盘级别运行。运行 badblock 命令时,物理传输方式也会参与其中。

相关内容