旧硬盘：解释 smartctl 输出

2024-6-13 • tag-icon

我收到了很多来自“smartd”守护进程的电子邮件，主题是：“SMART 错误 (CurrentPendingSector)”，内容是

The following warning/error was logged by the smartd daemon:
Device: /dev/sda, 1 Currently unreadable (pending) sectors

几个月来，它向我发送了 80 封这样的电子邮件。

我运行了“e2fsck -cc”、“smartctl”和“gsmartcontrol”。

“e2fsck -cc” 没有报告任何坏块。
‘gsmartcontrol’ 在‘smartctl’ 输出中突出显示了以下几行：

--

ID   ATTRIBUTE_NAME          FLAG   VALUE WORST THRESH TYPE    UPDATED  WHEN_FAILED RAW_VALUE  
...  
5    Reallocated_Sector_Ct   0x0033  100   100   005  Pre-fail  Always      -   1179816  
...  
196  Reallocated_Event_Count 0x0032  100   100   000    Old_age   Always      -   17  
197  Current_Pending_Sector  0x0022  100   100   000    Old_age   Always      -   1  
...

——这些由“gsmartcontrol”用粉色突出显示，而不是红色。

即，它报告了 1,179,816 个重新分配的扇区（有意义吗？）和 17 个重新分配事件。

不过，“最差”等于“价值”。

/var/log/messages 偶尔会有消息

Jul 24 03:12:46 turtle smartd[1443]: Device: /dev/sda, 1 Currently unreadable (pending) sectors

消息；过去几天总共有 38 条（！）

# smartctl -l error /dev/sda报告几个错误（如下）。

我该如何解释它们？我应该更换硬盘吗？

谢谢。

详细的“smartctl”输出如下。

# smartctl -H -A /dev/sda

SMART Attributes Data Structure revision number: 16  

Vendor Specific SMART Attributes with Thresholds:  

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   

1 Raw_Read_Error_Rate     0x000b   100   100   062 Pre-fail  Always       -       0  
2 Throughput_Performance  0x0005   105   100   040 Pre-fail  Offline      -       4572  
3 Spin_Up_Time            0x0007   223   100   033 Pre-fail  Always       -       2  
4 Start_Stop_Count        0x0012   098   098   000 Old_age   Always       -       3671  
5 Reallocated_Sector_Ct   0x0033   100   100   005 Pre-fail  Always       -       1179816  
7 Seek_Error_Rate         0x000b   100   100   067 Pre-fail  Always       -       0  
8 Seek_Time_Performance   0x0005   120   100   040 Pre-fail  Offline      -       40  
9 Power_On_Hours          0x0012   030   030   000 Old_age   Always       -       30819  
10 Spin_Retry_Count        0x0013   100   100   060   Pre-fail  Always       -       0  
12 Power_Cycle_Count       0x0032   099   099   000  Old_age   Always       -       2205  
191 G-Sense_Error_Rate      0x000a   100   095   000  Old_age   Always       -       1  
192 Power-Off_Retract_Count 0x0032   100   100   000  Old_age   Always       -       97  
193 Load_Cycle_Count        0x0012   001   001   000  Old_age   Always       -       1865772  
194 Temperature_Celsius     0x0002   177   100   000  Old_age   Always       -       31 (Lifetime Min/Max 9/48)  
196 Reallocated_Event_Count 0x0032   100   100   000  Old_age   Always       -       17  
197 Current_Pending_Sector  0x0022   100   100   000  Old_age   Always       -       1  
198 Offline_Uncorrectable   0x0008   100   100   000  Old_age   Offline      -       0  
199 UDMA_CRC_Error_Count    0x000a   200   190   000 Old_age   Always       -       38

# sudo smartctl -i /dev/sda

=== START OF INFORMATION SECTION ===  
Model Family:     Hitachi Travelstar 5K100 series  
Device Model:     HTS541060G9AT00  
Serial Number:    MPB3LAX5KUDB1M  
Firmware Version: MB3OA60A  
User Capacity:    60,011,642,880 bytes  
Device is:        In smartctl database [for details use: -P show]  
ATA Version is:   6  
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 3a  
..  
SMART support is: Available - device has SMART capability.  
SMART support is: Enabled

# smartctl -l error /dev/sda

=== START OF READ SMART DATA SECTION === SMART Error Log Version: 1 ATA Error Count: 80 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 80 occurred at disk power-on lifetime: 28086 hours (1170 days + 6 hours) When the command that caused the error occurred, the device was active or idle.

`命令完成后，寄存器为：
ER ST SC SN CL CH DH

40 51 3f 50 28 2c e1 错误：LBA 处的 UNC 63 扇区 = 0x012c2850 = 19671120`

`导致出现错误的命令是：
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 ff 3f 50 28 2c e1 00 04:33:56.000 READ DMA
c8 ff 3f 00 00 00 e0 00 04:33:56.000 READ DMA
c6 ff 10 00 02 00 a0 00 04:33:56.000 SET MULTIPLE MODE
10 ff 3f 01 00 00 ae 00 04:33:56.000 RECALIBRATE [OBS-4]
91 ff 3f 01 00 00 ae 00 04:33:56.000 INITIALIZE DEVICE PARAMETERS [OBS-6]
磁盘开机时发生错误 79 使用寿命：15200 小时（633 天 + 8 小时）
当导致发生错误，设备处于活动状态或空闲状态。
命令完成后，寄存器为：
ER ST SC SN CL CH DH

84 51 00 ae 3e 2f e4 错误：ICRC，ABRT 在 LBA = 0x042f3eae = 70205102`

导致出现错误的命令是：
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

c8 00 08 a7 3e 2f e4 00 00:00:30.600 读取 DMA c8 00 00 af 62 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 61 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 60 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 5f 2c e4 00 00:00:30.600 读取 DMA

发生错误 78...

答案1

每个 HDD 都有一些未使用的扇区，这些扇区在出厂时已保留，用于重新定位事件。一旦 HDD 固件检测到不可读扇区，它就会用“库存”中健康的扇区“替换”它。实际上，没有任何东西真正移动，它只记录必须使用扇区 yyy 而不是扇区 xxx。这称为扇区重新定位事件。

如果这种情况开始发生，则意味着 HDD 不健康，其表面开始退化，未来重新定位扇区的数量将会增加，具体取决于 HDD 的使用频率。到目前为止，您可以放心，您仍然可以使用此 HDD，但您需要监控重新定位进度并考虑将来更换 HDD。

答案1

相关内容