硬盘故障。“smartctl -a”输出可以解释什么?

硬盘故障。“smartctl -a”输出可以解释什么?

我被要求更换一个故障的硬盘,该硬盘被用作电视设置中的录音设备。

(2.5 英寸硬盘,简单 USB 接口,采用两部分塑料外壳)

由于它有一个 USB A 型电缆输出,因此只需将其插入 Ubuntu 笔记本电脑即可。

补充说明:USB 连接似乎运行正常,设备似乎只存在存储介质问题。因此,可以看到...

最大的文件似乎是 200MB 的加密流数据块。其余文件很可能是各种元数据;我甚至不会尝试解密其中的任何内容,这些录音是一组随机的电视节目,占空间的 7.5%。

“磁盘”说道:

  • 型号:东芝 MQ01ABD050V -63 (AX0N1Q)
  • 分区:500 GB,主引导记录,17MB 可用,然后是 500GB ext4 v1.0 分区
  • 评估:磁盘正常,16376 个坏扇区(29° C / 84° F)

除了“多次升级的读取错误”之外,还有更多的解释吗?

我怀疑造成破损的“驱动因素”是小型(甚至极小)的全封闭外壳,没有通风口;从而引起散热问题。

可能也曾遭受过冲击,因为该设备在电视旁边放置了两年。清理灰尘时,哎呀!它掉在地上。

$ sudo smartctl -a /dev/sdb
[sudo] password for hannu: 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.13.0-37-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MQ01ABD050V -63
Serial Number:    885YC2J1TF6G
LU WWN Device Id: 5 000039 8b43822ba
Firmware Version: AX0N1Q
User Capacity:    500 107 862 016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Wed Mar 30 19:53:04 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  120) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 115) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   084   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1125
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       200
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       10288
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   033   033   000    Old_age   Always       -       26898
 10 Spin_Retry_Count        0x0033   103   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       200
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       185
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       200
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       27 (Min/Max 22/58)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       854
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       6088
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       0
222 Loaded_Hours            0x0032   033   033   000    Old_age   Always       -       26898
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       178
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 467 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 467 occurred at disk power-on lifetime: 26805 hours (1116 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 b8 f0 73 13 4d  Error: UNC 184 sectors at LBA = 0x0d1373f0 = 219378672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 d5 08 a0 73 13 40 00      06:40:51.442  READ DMA EXT
  25 d5 c0 e8 72 13 40 00      06:40:51.333  READ DMA EXT
  25 d5 98 58 71 13 40 00      06:40:51.137  READ DMA EXT
  25 d5 88 d8 6f 13 40 00      06:40:50.928  READ DMA EXT
  25 d5 d0 10 6e 13 40 00      06:40:50.728  READ DMA EXT

Error 466 occurred at disk power-on lifetime: 26805 hours (1116 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 18 e0 74 13 4d  Error: UNC 24 sectors at LBA = 0x0d1374e0 = 219378912

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 d5 18 e0 74 13 40 00      06:38:34.673  READ DMA EXT
  25 d5 48 a0 73 13 40 00      06:38:31.303  READ DMA EXT
  25 d5 c0 e8 72 13 40 00      06:38:31.292  READ DMA EXT
  25 d5 40 b0 71 13 40 00      06:38:31.083  READ DMA EXT
  25 d5 30 88 6f 13 40 00      06:38:30.890  READ DMA EXT

Error 465 occurred at disk power-on lifetime: 26805 hours (1116 days + 21 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 f8 f0 73 13 4d  Error: UNC 248 sectors at LBA = 0x0d1373f0 = 219378672

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 d5 48 a0 73 13 40 00      06:38:31.303  READ DMA EXT
  25 d5 c0 e8 72 13 40 00      06:38:31.292  READ DMA EXT
  25 d5 40 b0 71 13 40 00      06:38:31.083  READ DMA EXT
  25 d5 30 88 6f 13 40 00      06:38:30.890  READ DMA EXT
  25 d5 b8 d8 6d 13 40 00      06:38:30.688  READ DMA EXT

Error 464 occurred at disk power-on lifetime: 26798 hours (1116 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 06 c2 76 06 40  Error: UNC 6 sectors at LBA = 0x000676c2 = 423618

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 06 c2 76 06 40 00      00:00:20.982  READ DMA EXT
  25 00 01 c1 76 06 40 00      00:00:17.605  READ DMA EXT
  25 00 01 c0 76 06 40 00      00:00:14.221  READ DMA EXT
  25 00 20 c0 76 06 40 00      00:00:10.840  READ DMA EXT
  25 00 08 b8 76 06 40 00      00:00:10.839  READ DMA EXT

Error 463 occurred at disk power-on lifetime: 26798 hours (1116 days + 14 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 01 c1 76 06 40  Error: UNC 1 sectors at LBA = 0x000676c1 = 423617

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 01 c1 76 06 40 00      00:00:17.605  READ DMA EXT
  25 00 01 c0 76 06 40 00      00:00:14.221  READ DMA EXT
  25 00 20 c0 76 06 40 00      00:00:10.840  READ DMA EXT
  25 00 08 b8 76 06 40 00      00:00:10.839  READ DMA EXT
  25 00 20 90 76 06 40 00      00:00:10.838  READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


$ smartctl -P showall /dev/sdb1
No presets are defined for this drive.  Its identity strings:
MODEL:    /dev/sdb1
FIRMWARE: (any)
do not match any of the known regular expressions.

答案1

Hannu,不要相信那些愚蠢的一维评估(红色、黄色、绿色)或一个短语结论,例如

评估:磁盘正常,16376 个坏扇区(29° C / 84° F)

有 16376 个坏扇区的磁盘不好!因为这表明预期寿命急剧下降。

此外,还有 6088 个无法读取的待处理扇区不好!也一样。

您的温度现在可能是 29°C,但已经达到 58°C,我们不知道持续了多久。您有 6088 个无法读取的扇区,其中 10288 个扇区已被替换。一旦出现无法读取的扇区,我就会更换驱动器。

G-Shock 参数可能表明您曾将驱动器摔过 3 次。不幸的是,我没有遇到过此特定参数。

以下是记录损害的相关报告:

供应商特定的 SMART 属性及阈值:

ID# ATTRIBUTE_NAME 标志值 最差阈值类型 已更新 WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 100 100 050 始终预故障 - 10288

191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age 始终 - 3

194 温度_摄氏度 0x0022 100 100 000 Old_age 始终 - 27(最小/最大 22/58)

196 Reallocated_Event_Count 0x0032 100 100 000 Old_age 始终 - 854

197 Current_Pending_Sector 0x0032 100 100 000 Old_age 始终 - 6088

结论:

使用 ddrescue 复制您的驱动器或将其发送到专业的恢复实验室!

附言:如果您要使用 ddrescue 复制驱动器,能否链接日志文件 (mapfile)?这样 harrymc 就可以重新考虑他的说法了。谢谢。

答案2

磁盘的 SMART 指示器显示没有任何错误,没有坏扇区,什么都没有。就他们而言,磁盘状况良好。

对于不理解 SMART 的反对者,以下是来自 NTFS.com 的一段引文 SMART 属性

属性值的范围是 1 到 253(1 代表最坏情况,253 代表最好情况)。根据制造商的不同,通常会选择 100 或 200 作为“正常”值

对于大多数属性来说,高于此阈值的值是好的,意味着没有错误。

值得注意的是,您确实有 467 个 ATA 错误,类型为 READ DMA EXT。

根据这篇文章 ReadyNAS 中磁盘上的 ATA 错误增加

当 ReadyNAS 的 SATA 控制器无法与硬盘通信时,就会发生 ATA 错误。

ReadyNAS 的 SATA 控制器向硬盘发送命令。当控制器无法与磁盘通信时,这可能是由于磁盘本身内部硬件错误导致的,可能需要更换。

这基本上意味着主板与磁盘连接存在问题。

此类错误会在磁盘的使用寿命内累积,并且时间戳不包含日期,因此无法确定错误发生的时间。

这可能是由 SATA 电缆损坏或磁盘问题引起的。尝试使用新电缆并运行 使用 smartctl 进行 SMART 测试. 这可以确定磁盘是否真的出现故障。

密切关注 ATA 错误数,看它是否仍在增加。

相关内容