我有一个驱动器在 SMART 测试中失败,其形式如下:
smartctl -a /dev/sdc
:
...
# 1 Short offline Completed: read failure 50% 6354 4377408
# 2 Extended offline Completed: read failure 90% 6354 4377408
然后我想把这个“扇区”标记为坏扇区,所以我假设我只需要在上面写入大量数据。所以我曾经dd
写过一堆零。这填满了驱动器,之后我又进行了另一次智能测试。
它成功完成,但是查看 SMART 属性,我没有看到任何变化:
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
除了充分了解我始终面临驱动器故障的风险之外,上述信息是否与驱动器故障相关?
这是 smartctl 属性之前/之后的差异:
diff --git a/x.txt b/x.txt
index 4cfe1b7..1bcace5 100644
--- a/x.txt
+++ b/x.txt
@@ -12,7 +12,7 @@ Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
-Local Time is: Sun Feb 24 16:50:01 2019 GMT
+Local Time is: Mon Feb 25 18:33:35 2019 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
@@ -55,31 +55,38 @@ SCT capabilities: (0x70b5) SCT Status supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
- 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
- 3 Spin_Up_Time 0x0027 180 179 021 Pre-fail Always - 5991
- 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 114
+ 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 4
+ 3 Spin_Up_Time 0x0027 177 177 021 Pre-fail Always - 6116
+ 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 116
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
- 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 6356
+ 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 6372
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
- 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57
+ 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 46
-193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 67
-194 Temperature_Celsius 0x0022 122 114 000 Old_age Always - 28
+193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 69
+194 Temperature_Celsius 0x0022 116 114 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
-200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1
+200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
-# 1 Short offline Completed: read failure 50% 6354 4377408
-# 2 Extended offline Completed: read failure 90% 6354 4377408
+# 1 Extended offline Completed without error 00% 6367 -
+# 2 Short offline Completed: read failure 60% 6361 4377409
+# 3 Short offline Completed: read failure 50% 6361 4377409
+# 4 Extended offline Completed: read failure 90% 6359 4377409
+# 5 Short offline Completed without error 00% 6359 -
+# 6 Short offline Completed: read failure 60% 6356 4377409
+# 7 Short offline Completed: read failure 50% 6354 4377408
+# 8 Extended offline Completed: read failure 90% 6354 4377408
+6 of 6 failed self-tests are outdated by newer successful extended offline self-test # 1
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
和当前输出smartctl -a
:
smartctl 6.6 2018-12-05 r4851 [x86_64-linux-4.14.98] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital AV-GP (AF)
Device Model: WDC WD20EURS-63SPKY0
Serial Number: WD-WMC1T2763021
LU WWN Device Id: 5 0014ee 6addb4b7c
Firmware Version: 80.00A80
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Feb 25 18:49:12 2019 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (27240) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 275) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 4
3 Spin_Up_Time 0x0027 177 177 021 Pre-fail Always - 6116
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 116
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 6373
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 46
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 69
194 Temperature_Celsius 0x0022 116 114 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 6367 -
# 2 Short offline Completed: read failure 60% 6361 4377409
# 3 Short offline Completed: read failure 50% 6361 4377409
# 4 Extended offline Completed: read failure 90% 6359 4377409
# 5 Short offline Completed without error 00% 6359 -
# 6 Short offline Completed: read failure 60% 6356 4377409
# 7 Short offline Completed: read failure 50% 6354 4377408
# 8 Extended offline Completed: read failure 90% 6354 4377408
6 of 6 failed self-tests are outdated by newer successful extended offline self-test # 1
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
答案1
不,您不想将其标记为坏扇区。您想要对不可读的扇区进行写操作:)
正如我昨天引用的smartctl 报告总体运行状况测试已通过,但测试失败?
如果磁盘可以单次读取该扇区的数据,并且损坏是永久性的,而不是暂时的,则磁盘固件会将该扇区标记为“坏”,并分配一个备用扇区来替换它。但如果磁盘一次都无法读取该扇区,则不会重新分配该扇区,希望能够在未来的某个时候从中读取数据。写入不可读(损坏)的扇区将解决该问题。如果损坏是暂时的,则新的一致数据将写入该扇区。如果损坏是永久性的,则写入将强制扇区重新分配。
(粗体部分是我写的,原始出处:smartmontools常见问题解答)
昨天没有重新分配的扇区,今天也没有重新分配的扇区。这意味着如果我们忽略坏扇区数量Raw_Read_Error_Rate
达到 4 个的事实,则磁盘的坏扇区“同样健康”。这是由离线测试引起的吗?
但是您在测试 1 和 5 中修复了不可读的扇区。这很好。但奇怪的是测试2-4也失败了。
嗯,也许我会再运行几次测试,看看会发生什么。并注意Raw_Read_Error_Rate
何时运行测试或使用 dd 写入零。