旧硬盘:解释 smartctl 输出

旧硬盘:解释 smartctl 输出

我收到了很多来自“smartd”守护进程的电子邮件,主题是:“SMART 错误 (CurrentPendingSector)”,内容是

The following warning/error was logged by the smartd daemon:
Device: /dev/sda, 1 Currently unreadable (pending) sectors

几个月来,它向我发送了 80 封这样的电子邮件。

我运行了“e2fsck -cc”、“smartctl”和“gsmartcontrol”。

  • “e2fsck -cc” 没有报告任何坏块。

  • ‘gsmartcontrol’ 在‘smartctl’ 输出中突出显示了以下几行:

--

ID   ATTRIBUTE_NAME          FLAG   VALUE WORST THRESH TYPE    UPDATED  WHEN_FAILED RAW_VALUE  
...  
5    Reallocated_Sector_Ct   0x0033  100   100   005  Pre-fail  Always      -   1179816  
...  
196  Reallocated_Event_Count 0x0032  100   100   000    Old_age   Always      -   17  
197  Current_Pending_Sector  0x0022  100   100   000    Old_age   Always      -   1  
...  

——这些由“gsmartcontrol”用粉色突出显示,而不是红色。

即,它报告了 1,179,816 个重新分配的扇区(有意义吗?)和 17 个重新分配事件。

不过,“最差”等于“价值”。

  • /var/log/messages 偶尔会有消息

    Jul 24 03:12:46 turtle smartd[1443]: Device: /dev/sda,
    1 Currently unreadable (pending) sectors

消息;过去几天总共有 38 条(!)

  • # smartctl -l error /dev/sda报告几个错误(如下)。

我该如何解释它们?我应该更换硬盘吗?

谢谢。

详细的“smartctl”输出如下。


# smartctl -H -A /dev/sda

SMART Attributes Data Structure revision number: 16  

Vendor Specific SMART Attributes with Thresholds:  

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   

1 Raw_Read_Error_Rate     0x000b   100   100   062 Pre-fail  Always       -       0  
2 Throughput_Performance  0x0005   105   100   040 Pre-fail  Offline      -       4572  
3 Spin_Up_Time            0x0007   223   100   033 Pre-fail  Always       -       2  
4 Start_Stop_Count        0x0012   098   098   000 Old_age   Always       -       3671  
5 Reallocated_Sector_Ct   0x0033   100   100   005 Pre-fail  Always       -       1179816  
7 Seek_Error_Rate         0x000b   100   100   067 Pre-fail  Always       -       0  
8 Seek_Time_Performance   0x0005   120   100   040 Pre-fail  Offline      -       40  
9 Power_On_Hours          0x0012   030   030   000 Old_age   Always       -       30819  
10 Spin_Retry_Count        0x0013   100   100   060   Pre-fail  Always       -       0  
12 Power_Cycle_Count       0x0032   099   099   000  Old_age   Always       -       2205  
191 G-Sense_Error_Rate      0x000a   100   095   000  Old_age   Always       -       1  
192 Power-Off_Retract_Count 0x0032   100   100   000  Old_age   Always       -       97  
193 Load_Cycle_Count        0x0012   001   001   000  Old_age   Always       -       1865772  
194 Temperature_Celsius     0x0002   177   100   000  Old_age   Always       -       31 (Lifetime Min/Max 9/48)  
196 Reallocated_Event_Count 0x0032   100   100   000  Old_age   Always       -       17  
197 Current_Pending_Sector  0x0022   100   100   000  Old_age   Always       -       1  
198 Offline_Uncorrectable   0x0008   100   100   000  Old_age   Offline      -       0  
199 UDMA_CRC_Error_Count    0x000a   200   190   000 Old_age   Always       -       38  

`

# sudo smartctl -i /dev/sda

=== START OF INFORMATION SECTION ===  
Model Family:     Hitachi Travelstar 5K100 series  
Device Model:     HTS541060G9AT00  
Serial Number:    MPB3LAX5KUDB1M  
Firmware Version: MB3OA60A  
User Capacity:    60,011,642,880 bytes  
Device is:        In smartctl database [for details use: -P show]  
ATA Version is:   6  
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 3a  
..  
SMART support is: Available - device has SMART capability.  
SMART support is: Enabled  
  • # smartctl -l error /dev/sda

    === START OF READ SMART DATA SECTION ===
    SMART Error Log Version: 1
    ATA Error Count: 80 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
    Powered_Up_Time is measured from power on, and printed as
    DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

    Error 80 occurred at disk power-on lifetime: 28086 hours (1170 days + 6 hours)
    When the command that caused the error occurred, the device was active or idle.

    `命令完成后,寄存器为:
    ER ST SC SN CL CH DH


    40 51 3f 50 28 2c e1 错误:LBA 处的 UNC 63 扇区 = 0x012c2850 = 19671120`

    `导致出现错误的命令是:
    CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


    c8 ff 3f 50 28 2c e1 00 04:33:56.000 READ DMA
    c8 ff 3f 00 00 00 e0 00 04:33:56.000 READ DMA
    c6 ff 10 00 02 00 a0 00 04:33:56.000 SET MULTIPLE MODE
    10 ff 3f 01 00 00 ae 00 04:33:56.000 RECALIBRATE [OBS-4]
    91 ff 3f 01 00 00 ae 00 04:33:56.000 INITIALIZE DEVICE PARAMETERS [OBS-6]
    磁盘开机时发生错误 79 使用寿命:15200 小时(633 天 + 8 小时)
    当导致发生错误,设备处于活动状态或空闲状态。
    命令完成后,寄存器为:
    ER ST SC SN CL CH DH


    84 51 00 ae 3e 2f e4 错误:ICRC,ABRT 在 LBA = 0x042f3eae = 70205102`

    导致出现错误的命令是:
    CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name


    c8 00 08 a7 3e 2f e4 00 00:00:30.600 读取 DMA c8 00 00 af 62 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 61 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 60 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 5f 2c e4 00 00:00:30.600 读取 DMA

    发生错误 78...

答案1

每个 HDD 都有一些未使用的扇区,这些扇区在出厂时已保留,用于重新定位事件。一旦 HDD 固件检测到不可读扇区,它就会用“库存”中健康的扇区“替换”它。实际上,没有任何东西真正移动,它只记录必须使用扇区 yyy 而不是扇区 xxx。这称为扇区重新定位事件。

如果这种情况开始发生,则意味着 HDD 不健康,其表面开始退化,未来重新定位扇区的数量将会增加,具体取决于 HDD 的使用频率。到目前为止,您可以放心,您仍然可以使用此 HDD,但您需要监控重新定位进度并考虑将来更换 HDD。

相关内容