我有一个由 6 个磁盘组成的 ZFS 池,排列在 3 个条带镜像中。服务器是 Supermicro X11SSM-F,配备 Xeon CPU、32 GB ECC RAM,运行 Ubuntu 17.04。我使用 2 个 Icy Dock MB154SP-B 来物理托管磁盘,主板上有 8 个 SATA 3 连接器,因此磁盘直接呈现给 ZFS(中间没有 RAID 卡)。
直到最近,此设置都运行良好。然后我在运行时注意到zpool status
最后一次清理已修复了一些数据:
$ sudo zpool status
pool: cloudpool
state: ONLINE
scan: scrub repaired 2.98M in 4h56m with 0 errors on Sun Jul 9 05:20:16 2017
config:
NAME STATE READ WRITE CKSUM
cloudpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0
ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0
ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0
errors: No known data errors
出于好奇,我决定进行一次新的清理:
$ sudo zpool scrub cloudpool
... giving it a few minutes to run ...
$ sudo zpool status
pool: cloudpool
state: ONLINE
scan: scrub in progress since Tue Jul 11 22:55:12 2017
124M scanned out of 4.52T at 4.59M/s, 286h55m to go
256K repaired, 0.00% done
config:
NAME STATE READ WRITE CKSUM
cloudpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 (repairing)
ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0
ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0
errors: No known data errors
在我让它完成后,我得到了以下信息:
$ sudo zpool status
pool: cloudpool
state: ONLINE
scan: scrub repaired 624K in 4h35m with 0 errors on Wed Jul 12 03:31:00 2017
config:
NAME STATE READ WRITE CKSUM
cloudpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0
ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0
ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0
errors: No known data errors
然后我决定再次开始清理池子。运行一段时间后,我得到了以下结果:
$ sudo zpool status
pool: cloudpool
state: ONLINE
scan: scrub in progress since Wed Jul 12 09:55:19 2017
941G scanned out of 4.52T at 282M/s, 3h42m to go
112K repaired, 20.34% done
config:
NAME STATE READ WRITE CKSUM
cloudpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0 (repairing)
ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0
ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0
errors: No known data errors
查看磁盘的 SMART 数据时,我没有看到任何可疑的东西(也许除了Raw_Read_Error_Rate
?):
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-26-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E5NFLRU3
LU WWN Device Id: 5 0014ee 262ee543f
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Jul 12 10:19:08 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (52020) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 520) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 99
3 Spin_Up_Time 0x0027 186 176 021 Pre-fail Always - 7683
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6735
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5
193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7500
194 Temperature_Celsius 0x0022 110 108 000 Old_age Always - 42
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 13 -
# 2 Conveyance offline Completed without error 00% 1 -
# 3 Short offline Completed without error 00% 1 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
然而,我在输出中看到了一些奇怪的消息dmesg
:
[100240.777601] ata2.00: exception Emask 0x0 SAct 0x3000000 SErr 0x0 action 0x0
[100240.777608] ata2.00: irq_stat 0x40000008
[100240.777614] ata2.00: failed command: READ FPDMA QUEUED
[100240.777624] ata2.00: cmd 60/00:c0:c8:bc:01/01:00:00:00:00/40 tag 24 ncq dma 131072 in
res 41/40:00:a8:bd:01/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[100240.777628] ata2.00: status: { DRDY ERR }
[100240.777631] ata2.00: error: { UNC }
[100240.779320] ata2.00: configured for UDMA/133
[100240.779342] sd 1:0:0:0: [sdb] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[100240.779346] sd 1:0:0:0: [sdb] tag#24 Sense Key : Medium Error [current]
[100240.779350] sd 1:0:0:0: [sdb] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
[100240.779354] sd 1:0:0:0: [sdb] tag#24 CDB: Read(16) 88 00 00 00 00 00 00 01 bc c8 00 00 01 00 00 00
[100240.779357] blk_update_request: I/O error, dev sdb, sector 114088
[100240.779384] ata2: EH complete
[100244.165785] ata2.00: exception Emask 0x0 SAct 0x3d SErr 0x0 action 0x0
[100244.165793] ata2.00: irq_stat 0x40000008
[100244.165798] ata2.00: failed command: READ FPDMA QUEUED
[100244.165807] ata2.00: cmd 60/00:00:c8:be:01/01:00:00:00:00/40 tag 0 ncq dma 131072 in
res 41/40:00:70:bf:01/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[100244.165811] ata2.00: status: { DRDY ERR }
[100244.165814] ata2.00: error: { UNC }
[100244.167465] ata2.00: configured for UDMA/133
[100244.167488] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[100244.167492] sd 1:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[100244.167496] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[100244.167500] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 01 be c8 00 00 01 00 00 00
[100244.167503] blk_update_request: I/O error, dev sdb, sector 114544
[100244.167531] ata2: EH complete
[100248.177949] ata2.00: exception Emask 0x0 SAct 0x41c00002 SErr 0x0 action 0x0
[100248.177957] ata2.00: irq_stat 0x40000008
[100248.177963] ata2.00: failed command: READ FPDMA QUEUED
[100248.177972] ata2.00: cmd 60/00:f0:c8:c0:01/01:00:00:00:00/40 tag 30 ncq dma 131072 in
res 41/40:00:b8:c1:01/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[100248.177977] ata2.00: status: { DRDY ERR }
[100248.177980] ata2.00: error: { UNC }
[100248.179638] ata2.00: configured for UDMA/133
[100248.179667] sd 1:0:0:0: [sdb] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[100248.179671] sd 1:0:0:0: [sdb] tag#30 Sense Key : Medium Error [current]
[100248.179675] sd 1:0:0:0: [sdb] tag#30 Add. Sense: Unrecovered read error - auto reallocate failed
[100248.179679] sd 1:0:0:0: [sdb] tag#30 CDB: Read(16) 88 00 00 00 00 00 00 01 c0 c8 00 00 01 00 00 00
[100248.179682] blk_update_request: I/O error, dev sdb, sector 115128
[100248.179705] ata2: EH complete
...
通过 grepping dmesg
,我在日志中看到 31 个这样的实例:
[100240.779357] blk_update_request: I/O error, dev sdb, sector 114088
[100244.167503] blk_update_request: I/O error, dev sdb, sector 114544
[100248.179682] blk_update_request: I/O error, dev sdb, sector 115128
[100251.599649] blk_update_request: I/O error, dev sdb, sector 115272
[100255.812020] blk_update_request: I/O error, dev sdb, sector 115576
[100259.636088] blk_update_request: I/O error, dev sdb, sector 115768
[100263.400169] blk_update_request: I/O error, dev sdb, sector 116000
[100267.912099] blk_update_request: I/O error, dev sdb, sector 116472
[100271.300223] blk_update_request: I/O error, dev sdb, sector 116680
[100274.732989] blk_update_request: I/O error, dev sdb, sector 117000
[100279.665331] blk_update_request: I/O error, dev sdb, sector 118624
[100283.043738] blk_update_request: I/O error, dev sdb, sector 118768
[100286.456260] blk_update_request: I/O error, dev sdb, sector 119072
[100293.472354] blk_update_request: I/O error, dev sdb, sector 7814018576
[100298.443416] blk_update_request: I/O error, dev sdb, sector 119496
[100302.236908] blk_update_request: I/O error, dev sdb, sector 119968
[100305.655675] blk_update_request: I/O error, dev sdb, sector 120032
[100309.450754] blk_update_request: I/O error, dev sdb, sector 120496
[100313.724792] blk_update_request: I/O error, dev sdb, sector 121512
[100324.782008] blk_update_request: I/O error, dev sdb, sector 186032
[100329.002031] blk_update_request: I/O error, dev sdb, sector 189536
[100333.057101] blk_update_request: I/O error, dev sdb, sector 189680
[100336.476953] blk_update_request: I/O error, dev sdb, sector 189888
[100341.133527] blk_update_request: I/O error, dev sdb, sector 190408
[100349.890540] blk_update_request: I/O error, dev sdb, sector 191824
[353944.190625] blk_update_request: I/O error, dev sdb, sector 115480
[353951.660635] blk_update_request: I/O error, dev sdb, sector 116536
[353959.391011] blk_update_request: I/O error, dev sdb, sector 118976
[353966.811863] blk_update_request: I/O error, dev sdb, sector 120176
[353978.447354] blk_update_request: I/O error, dev sdb, sector 189984
[393732.681767] blk_update_request: I/O error, dev sdb, sector 190000
我不太清楚该如何理解:
- 为什么清理过程会继续修复同一磁盘上的数据?修复的数据量正在减少,这似乎表明数据正在“持久”修复,但为什么在清理过程间隔另一次清理几个小时后仍然有数据需要修复?
zpool status
尽管 ZFS 在每次清理时都发现一些需要纠正的数据,为什么我没有看到任何读/写/校验和错误的迹象?- 为什么我在磁盘上看到 URE,但在 SMART 报告中却没有任何可疑内容?
- 这是什么
auto reallocate failed
意思?磁盘已经用完了替换块吗?这是一个在过去约 6 个月内运行顺利的系统,所以我预计这个磁盘的问题会很快出现。
更实际的是,这对这个特定的磁盘意味着什么?它需要更换吗?
编辑#1经过最近的清理后,我现在得到以下信息:
$ sudo zpool status
pool: cloudpool
state: ONLINE
scan: scrub repaired 0 in 4h35m with 0 errors on Wed Jul 12 21:44:41 2017
config:
NAME STATE READ WRITE CKSUM
cloudpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17FZXF ONLINE 0 0 0
ata-ST8000VN0022-2EL112_ZA17H5D3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5NFLRU3 ONLINE 0 0 0
ata-ST4000VN000-2AH166_WDH0KMHT ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N3EHHA2E ONLINE 0 0 0
ata-ST3000DM001-1CH166_Z1F1HL4V ONLINE 0 0 0
errors: No known data errors
Smartctl 现在说:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 122
3 Spin_Up_Time 0x0027 186 176 021 Pre-fail Always - 7683
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 6749
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 33
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5
193 Load_Cycle_Count 0x0032 198 198 000 Old_age Always - 7507
194 Temperature_Celsius 0x0022 114 108 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
那么...我很好?那到底是怎么回事?
答案1
对于后人而言:当人们读到这样的错误时
[100244.167488] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[100244.167492] sd 1:0:0:0: [sdb] tag#0 Sense Key : Medium Error [current]
[100244.167496] sd 1:0:0:0: [sdb] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
[100244.167500] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 01 be c8 00 00 01 00 00 00
[100244.167503] blk_update_request: I/O error, dev sdb, sector 114544
这意味着物理磁盘表面存在某种缺陷。
更具体地说,该消息Sense: Unrecovered read error - auto reallocate failed
意味着磁盘遇到了不可恢复的读取错误。但是什么是无法恢复的读取错误?
磁盘以扇区为单位读取数据,每个扇区都有专用的 ECC。当 ECC 错误超过特定阈值时,磁盘固件会自动重新映射刚刚读取的扇区 - 对用户而言是透明的。在这种情况下不会记录任何内核错误,观察此类行为的唯一方法是通过 SMART 属性。
但是,如果扇区根本无法读取(可能是因为累积了太多错误,ECC 无法检索原始数据),Unrecovered read error - auto reallocate failed
则会显示内核错误消息。如果这种情况发生在单磁盘(或 RAID0)系统上,则数据确实丢失了 - 您只能从备份中检索它。
如果使用具有冗余的 RAID 级别 (RAID1/5/6),系统可以维修通过覆盖来修复坏扇区:磁盘将使用其中一个备用扇区重新映射故障扇区,并立即用其他磁盘获取的良好数据副本覆盖该扇区。如果没有备用扇区可用,内核将记录一个failed command: WRITE FPDMA QUEUED
,您应该尽快更换磁盘。
海报中的具体情况显示了一个mirror
设置,这意味着 ZFS 能够修复/重新映射故障扇区。如果磁盘仅显示少量坏扇区和重新映射扇区,且没有显著增长,则可以持续很长时间。另一方面,如果内核日志中经常发现此类错误,则必须考虑尽快更换故障磁盘。