我收到了很多来自“smartd”守护进程的电子邮件,主题是:“SMART 错误 (CurrentPendingSector)”,内容是
The following warning/error was logged by the smartd daemon:
Device: /dev/sda, 1 Currently unreadable (pending) sectors
几个月来,它向我发送了 80 封这样的电子邮件。
我运行了“e2fsck -cc”、“smartctl”和“gsmartcontrol”。
“e2fsck -cc” 没有报告任何坏块。
‘gsmartcontrol’ 在‘smartctl’ 输出中突出显示了以下几行:
--
ID ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
...
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1179816
...
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 17
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1
...
——这些由“gsmartcontrol”用粉色突出显示,而不是红色。
即,它报告了 1,179,816 个重新分配的扇区(有意义吗?)和 17 个重新分配事件。
不过,“最差”等于“价值”。
/var/log/messages 偶尔会有消息
Jul 24 03:12:46 turtle smartd[1443]: Device: /dev/sda,
1 Currently unreadable (pending) sectors
消息;过去几天总共有 38 条(!)
# smartctl -l error /dev/sda
报告几个错误(如下)。
我该如何解释它们?我应该更换硬盘吗?
谢谢。
详细的“smartctl”输出如下。
# smartctl -H -A /dev/sda
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0005 105 100 040 Pre-fail Offline - 4572
3 Spin_Up_Time 0x0007 223 100 033 Pre-fail Always - 2
4 Start_Stop_Count 0x0012 098 098 000 Old_age Always - 3671
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 1179816
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 120 100 040 Pre-fail Offline - 40
9 Power_On_Hours 0x0012 030 030 000 Old_age Always - 30819
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 2205
191 G-Sense_Error_Rate 0x000a 100 095 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 97
193 Load_Cycle_Count 0x0012 001 001 000 Old_age Always - 1865772
194 Temperature_Celsius 0x0002 177 100 000 Old_age Always - 31 (Lifetime Min/Max 9/48)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 17
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 190 000 Old_age Always - 38
`
# sudo smartctl -i /dev/sda
=== START OF INFORMATION SECTION ===
Model Family: Hitachi Travelstar 5K100 series
Device Model: HTS541060G9AT00
Serial Number: MPB3LAX5KUDB1M
Firmware Version: MB3OA60A
User Capacity: 60,011,642,880 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
..
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
# smartctl -l error /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 80 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.Error 80 occurred at disk power-on lifetime: 28086 hours (1170 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.`命令完成后,寄存器为:
ER ST SC SN CL CH DH
40 51 3f 50 28 2c e1 错误:LBA 处的 UNC 63 扇区 = 0x012c2850 = 19671120`
`导致出现错误的命令是:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
c8 ff 3f 50 28 2c e1 00 04:33:56.000 READ DMA
c8 ff 3f 00 00 00 e0 00 04:33:56.000 READ DMA
c6 ff 10 00 02 00 a0 00 04:33:56.000 SET MULTIPLE MODE
10 ff 3f 01 00 00 ae 00 04:33:56.000 RECALIBRATE [OBS-4]
91 ff 3f 01 00 00 ae 00 04:33:56.000 INITIALIZE DEVICE PARAMETERS [OBS-6]
磁盘开机时发生错误 79 使用寿命:15200 小时(633 天 + 8 小时)
当导致发生错误,设备处于活动状态或空闲状态。
命令完成后,寄存器为:
ER ST SC SN CL CH DH
84 51 00 ae 3e 2f e4 错误:ICRC,ABRT 在 LBA = 0x042f3eae = 70205102`
导致出现错误的命令是:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
c8 00 08 a7 3e 2f e4 00 00:00:30.600 读取 DMA c8 00 00 af 62 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 61 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 60 2c e4 00 00:00:30.600 读取 DMA c8 00 00 af 5f 2c e4 00 00:00:30.600 读取 DMA
发生错误 78...
答案1
每个 HDD 都有一些未使用的扇区,这些扇区在出厂时已保留,用于重新定位事件。一旦 HDD 固件检测到不可读扇区,它就会用“库存”中健康的扇区“替换”它。实际上,没有任何东西真正移动,它只记录必须使用扇区 yyy 而不是扇区 xxx。这称为扇区重新定位事件。
如果这种情况开始发生,则意味着 HDD 不健康,其表面开始退化,未来重新定位扇区的数量将会增加,具体取决于 HDD 的使用频率。到目前为止,您可以放心,您仍然可以使用此 HDD,但您需要监控重新定位进度并考虑将来更换 HDD。