制造商的工具发现了坏块,但 smartctl 没有显示任何坏块

制造商的工具发现了坏块,但 smartctl 没有显示任何坏块

我的问题描述相当大,所以首先我会做一个简短的总结,然后我会精确地描述情况。

简短摘要:制造商的诊断工具发现并修复了我的硬盘上的一些错误。据我了解工具手册,这些错误是坏块。然而,smartctl(在硬盘上执行 SMART 的 Linux 工具)没有显示任何重新分配的扇区,并表示硬盘状况良好。第一个问题:这怎么可能?修复坏块意味着重新分配扇区,对吗?那么为什么 smartctl 不报告任何重新分配的扇区呢?第二个问题:我几个月前购买了这张磁盘,并且仍然有保修。我是否应该要求卖家更换新的,或者该磁盘是否良好并且我可以继续使用它?

现在准确的描述:

我有西数硬盘,型号为WDC WD5000AAKX-001CA0。最近我注意到我的计算机有时会挂起几秒钟(大约一分钟)。挂起后 dmesg 显示如下错误:

knoppix@Microknoppix:~$ dmesg
(...)
[  504.003363] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  504.003374] ata1.00: failed command: READ DMA EXT
[  504.003383] ata1.00: cmd 25/00:00:80:07:01/00:02:00:00:00/e0 tag 0 dma 262144 in
[  504.003385]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  504.003389] ata1.00: status: { DRDY }
[  509.016652] ata1: link is slow to respond, please be patient (ready=0)
[  514.030002] ata1: soft resetting link
[  514.200386] ata1.00: configured for UDMA/133
[  514.200420] ata1: EH complete
[  546.003333] ata1: lost interrupt (Status 0x50)
[  546.003364] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  546.003371] ata1.00: failed command: READ DMA EXT
[  546.003380] ata1.00: cmd 25/00:00:80:15:06/00:02:00:00:00/e0 tag 0 dma 262144 in
[  546.003381]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  546.003386] ata1.00: status: { DRDY }
[  546.003401] ata1: soft resetting link
[  546.181205] ata1.00: configured for UDMA/133
[  546.181234] ata1: EH complete

然而,smartctl 说“SMART 整体健康自我评估测试结果:通过”(稍后我将粘贴 smartctl 的完整输出几段)。每当我尝试进行 smartctl 自测试(使用 smartctl -t Short 或 smartctl -t long)时,此类测试都会被报告为被主机中止。所以我为我的硬盘下载了可启动 CD 诊断工具 - 这个:http://support.wdc.com/product/download.asp?groupid=606&sid=2&lang=en

首先使用这个工具我做了快速测试,它显示错误(不幸的是,我不记得错误代码是什么)。据我了解,该工具仅执行智能快速自测试(http://wdc.custhelp.com/app/answers/detail/search/1/a_id/940/c/130/p/227,295 说“快速测试 -执行 SMART 驱动器快速自检,以收集并验证驱动器上包含的 Data Lifeguard 信息。”)然后我进行了扩展测试。据我了解,此扩展测试会查找坏扇区(http://wdc.custhelp.com/app/answers/detail/search/1/a_id/940/c/130/p/227,295 表示“扩展测试 -执行完整媒体扫描以检测坏扇区”)。一段时间后,该工具告知它发现并修复了一些错误。

现在我用 knoppix 启动机器并执行“smartctl --all”。这是它的输出:

root@Microknoppix:/home/knoppix# smartctl --all /dev/sda
smartctl 5.43 2012-06-05 r3561 [i686-linux-3.4.9] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD5000AAKX-001CA0
Serial Number:    WD-WMAYUW952768
LU WWN Device Id: 5 0014ee 6ad1d9ef1
Firmware Version: 15.01H15
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Dec 12 03:34:39 2012 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        ( 8160) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  83) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x3037) SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       486
  3 Spin_Up_Time            0x0027   189   141   021    Pre-fail  Always       -       1525
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       587
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1553
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       578
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       173
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       413
194 Temperature_Celsius     0x0022   097   093   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       5
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       5

SMART Error Log Version: 1
ATA Error Count: 2
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 1548 hours (64 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 30 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 be 4f c2 a0 02      00:02:58.316  SMART WRITE LOG
  b0 da 01 00 4f c2 a0 02      00:02:58.259  SMART RETURN STATUS
  80 44 00 00 44 57 a0 02      00:02:58.259  [VENDOR SPECIFIC]
  b0 d6 01 be 4f c2 a0 02      00:02:58.241  SMART WRITE LOG
  80 45 00 01 44 57 a0 02      00:02:58.241  [VENDOR SPECIFIC]

Error 1 occurred at disk power-on lifetime: 1515 hours (63 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 30 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 be 4f c2 a0 02      00:02:21.841  SMART WRITE LOG
  b0 da 01 00 4f c2 a0 02      00:02:21.784  SMART RETURN STATUS
  80 44 00 00 44 57 a0 02      00:02:21.784  [VENDOR SPECIFIC]
  b0 d6 01 be 4f c2 a0 02      00:02:21.768  SMART WRITE LOG
  80 45 00 01 44 57 a0 02      00:02:21.768  [VENDOR SPECIFIC]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed without error       00%      1552         -
# 2  Conveyance offline  Completed: read failure       90%      1548         787927349
# 3  Conveyance offline  Completed: read failure       90%      1515         883391611
# 4  Short offline       Completed without error       00%      1503         -
# 5  Short offline       Completed without error       00%      1503         -
# 6  Short offline       Aborted by host               80%      1502         -
# 7  Extended offline    Completed without error       00%         9         -
# 8  Short offline       Completed without error       00%         6         -
# 9  Short offline       Aborted by host               90%         6         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

正如您所看到的,一方面,一次离线传输已完成,但读取失败。但是,另一方面,所有属性似乎都不错 - 例如,Realulated_Sector_Ct 为 0。

我还再次尝试将整个磁盘转移到 /dev/null - dmesg 中再次出现错误:

root@Microknoppix:/home/knoppix# nice -n 20 ionice -c 3 cat /dev/sda > /dev/null
During this cat dmesg shows such errors:
knoppix@Microknoppix:~$ dmesg
(...)
[  504.003363] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  504.003374] ata1.00: failed command: READ DMA EXT
[  504.003383] ata1.00: cmd 25/00:00:80:07:01/00:02:00:00:00/e0 tag 0 dma 262144 in
[  504.003385]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  504.003389] ata1.00: status: { DRDY }
[  509.016652] ata1: link is slow to respond, please be patient (ready=0)
[  514.030002] ata1: soft resetting link
[  514.200386] ata1.00: configured for UDMA/133
[  514.200420] ata1: EH complete
[  546.003333] ata1: lost interrupt (Status 0x50)
[  546.003364] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  546.003371] ata1.00: failed command: READ DMA EXT
[  546.003380] ata1.00: cmd 25/00:00:80:15:06/00:02:00:00:00/e0 tag 0 dma 262144 in
[  546.003381]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  546.003386] ata1.00: status: { DRDY }
[  546.003401] ata1: soft resetting link
[  546.181205] ata1.00: configured for UDMA/133
[  546.181234] ata1: EH complete

我认为这可能是主板或连接磁盘到主板的数据线的故障。因此,我使用相同的电缆和插槽将另一个磁盘连接到我的主板,并将其连接到 /dev/null。它成功了,dmesg 没有显示任何错误。

答案1

没有重新分配的扇区,因为它们未能重新分配。您的驱动器显示 5 个 Offline_Un Correctable 扇区,这是自动修复失败时发生的情况。 dmesg 输出中显示明显的读取失败、SMART 错误以及 SMART 测试的读取失败。正如您在问题中提到的,有多种修复这些扇区的方法,但根据我的经验,这是一个非常短期的修复。

更换驱动器。

相关内容