通过 SMART 属性诊断驱动器是否可靠

通过 SMART 属性诊断驱动器是否可靠

我正在尝试弄清楚我的硬盘是否快坏了。我研究了智能值,看起来可能是这样,但它仍然可以正常读取和写入数据,并且没有出现新的错误。

以前197 Current_Pending_Sector的值是 8,但是在将驱动器清零后,该值恢复为 0,并且为196 Reallocated_Event_Count0。

这是否意味着驱动器本身没有问题而只是暂时的系统问题?

另外值得关注的是188 Command_Timeout其值为 1,其定义为:

由于 HDD 超时而中止操作的次数。通常,此属性值应等于零,如果该值远大于零,则很可能是电源或氧化数据线存在一些严重问题。

我一直在进行一些低级编程,并且不得不强制关闭我的电脑大约 50 次。

我假设191 G-Sense_Error_Rate438 这个值是可以的,我认为这是由于在硬盘开启的情况下移动笔记本电脑造成的。

真正有趣的是,我的 Windows 分区停止启动,无法安装在另一台 Windows 或 Linux 机器上,但它在 OSX 上安装得很好,让我能够恢复我的文件。我重新安装并将数据复制到它,它似乎运行正常。OSX 在另一个驱动器上。

H2O:~ jeremiah$ smartctl -a /dev/disk1
smartctl 6.3 2014-07-26 r3976 [x86_64-apple-darwin14.1.0] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HGST HTS541075A9E680
Serial Number:    JD13021X0A00GK
LU WWN Device Id: 5 000cca 764c48bc4
Firmware Version: JA2OA590
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Mar 11 21:59:30 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (   45) seconds.
Offline data collection
capabilities:            (0x51) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 164) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   086   062    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0025   100   100   040    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0023   169   100   033    Pre-fail  Always       -       1
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       981
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   040    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       2586
 10 Spin_Retry_Count        0x0033   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       851
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   001   000    Old_age   Always       -       144929376764360
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   069   050   045    Old_age   Always       -       31 (Min/Max 24/31)
191 G-Sense_Error_Rate      0x0032   099   099   000    Old_age   Always       -       438
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2031647
193 Load_Cycle_Count        0x0032   089   089   000    Old_age   Always       -       115337
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       0
223 Load_Retry_Count        0x002a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 456 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 456 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 38 8d 62 00  Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 8d 62 40 00      00:00:34.282  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:30.471  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:26.660  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:22.849  READ DMA EXT
  2f 00 01 10 00 00 00 00      00:00:22.849  READ LOG EXT

Error 455 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 38 8d 62 00  Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 8d 62 40 00      00:00:30.471  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:26.660  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:22.849  READ DMA EXT
  2f 00 01 10 00 00 00 00      00:00:22.849  READ LOG EXT
  60 08 a8 38 8d 62 40 00      00:00:19.060  READ FPDMA QUEUED

Error 454 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 38 8d 62 00  Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 8d 62 40 00      00:00:26.660  READ DMA EXT
  25 00 08 38 8d 62 40 00      00:00:22.849  READ DMA EXT
  2f 00 01 10 00 00 00 00      00:00:22.849  READ LOG EXT
  60 08 a8 38 8d 62 40 00      00:00:19.060  READ FPDMA QUEUED
  60 08 a0 30 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED

Error 453 occurred at disk power-on lifetime: 2549 hours (106 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 38 8d 62 00  Error: UNC 8 sectors at LBA = 0x00628d38 = 6458680

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 38 8d 62 40 00      00:00:22.849  READ DMA EXT
  2f 00 01 10 00 00 00 00      00:00:22.849  READ LOG EXT
  60 08 a8 38 8d 62 40 00      00:00:19.060  READ FPDMA QUEUED
  60 08 a0 30 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED
  60 08 98 28 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED

Error 452 occurred at disk power-on lifetime: 2548 hours (106 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 08 38 8d 62 00  Error: UNC at LBA = 0x00628d38 = 6458680

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 a8 38 8d 62 40 00      00:00:19.060  READ FPDMA QUEUED
  60 08 a0 30 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED
  60 08 98 28 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED
  60 08 90 20 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED
  60 08 88 18 8d 62 40 00      00:00:19.059  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

答案1

197 Current_Pending_Sector 的值曾经是 8,但是在将驱动器清零之后,该值恢复为 0,并且 196 Reallocated_Event_Count 为 0。

这意味着驱动器在某一时刻无法读取某些扇区,但自从您将驱动器清零后,这些扇区就再也没有出现过任何问题。当您用新数据覆盖整个驱动器时,扇区从待定重新分配状态变为正常状态,并且驱动器可能对写入感到满意,因为此时扇区尚未重新分配。您应该运行长时间的 SMART 自检(通常包括表面扫描)来验证,但这很可能是一个故障,可能与驱动器运行时移动计算机有关。

另外值得关注的是 188 Command_Timeout,其值为 1,其定义为:

无需担心。驱动器报告通电时间接近 2600 小时,并且在此期间出现过一次命令超时。操作系统通过重试失败的命令或使 I/O 操作失败来处理命令超时,因此如果这是一个持续存在的问题,您就会知道。可能与 8 个待处理扇区有关,也可能无关。

如果这个数字开始明显上升,我就会担心,但如果出现个位数的超时次数且没有其他系统运行问题的迹象,我就不会担心。

我一直在进行一些低级编程,并且不得不强制关闭我的电脑大约 50 次。

这不应该在任何值得担心的层面上影响物理驱动器,尽管它可能会影响逻辑数据一致性(文件系统损坏等)。

另外,来自锯末的评论:

您应该运行简短和扩展的自检。大量的 ID#187 Reported_Uncorrect 错误表明存在问题。似乎大约 40 小时前出现了大量无法纠正的读取错误。

这很有道理,但是我们不知道原始值的编码。我们可以看出,“值”目前是标准化的 100,最差值为 1,阈值(用于报告驱动器已发生故障或即将发生故障)为 0。也就是说,在目前的时间驱动器并不认为这个值值得担心。1.45e14 读取错误听起来几乎高得不可思议;据驱动器自己承认,该驱动器有大约 183,000 个扇区(750 GB,4 KiB/扇区)。为了获得报告的读取失败次数(原始值),每个扇区在报告的 2,586 个通电小时内都必须发生 791,000 次失败,或者一次彻底的读取失败全部的每 11 秒就会出现一次。这简直是一个荒谬的数字(在 10 秒内,你将能够只占整个磁盘表面的一小部分),因此我们可以有把握地得出结论,对于此驱动器和属性 187,高度确定的是,原始值是其他东西而不是简单的整数计数。原始值可能被分成两部分,高位或低位编码实际值,其他位编码其他内容。该属性的原始值的十六进制值为 83D0 0005 01C8,中间的零串确实表明了这种编码;虽然这当然是可能的,但随机错误计数不太可能在中间有这么长的零串。例如,如果我们取低位(501C8 十六进制),则报告的错误数为 328,136,虽然这听起来仍然很多,但很多更加可信。

底线是,SMART 可以成为一种出色的监控工具,但它并非旨在捕获和报告所有问题。有些硬盘在 SMART 指示它们应该完全坏掉之后很长时间仍能正常运转,而有些硬盘在发生故障后即使 SMART 指示一切正常,也会发生灾难性故障。请将 SMART 数据视为一种预警系统和状态报告,不是关于驱动器健康状况的某种绝对事实。此外,您必须以批判的眼光阅读原始值,因为这些值的编码是实现定义的。相当,您应该查看报告的“值”与驱动器的“阈值”值的比较情况,因为这些值应该是由制造商针对特定驱动器进行有意义的定义的。

如果你担心那些较早待处理的(基本上意味着“发现难以读取”)扇区,通过 SMART 运行整个表面扫描。如果它们显示为“待处理”,那么可能值得考虑是否更换驱动器,但简单的事实是几乎任何驱动器都会发展一些在其使用寿命内,硬盘驱动器会损坏其扇区,并且会通过重新分配坏扇区来弥补这一缺陷。但是,重新分配需要知道数据,因此如果某个扇区坏了,则只能在写入该扇区时重新分配数据。

相关内容