Linux 上的 ZFS,Ubuntu 16.04LTS。ZFS 池,带有 5x4TB 驱动器的 raidz1 vdev。
昨天我发现一个驱动器完全坏了。(磁头试图重置、重新校准时发出可听见的噪音)。它坏了。所以我把它离线了。
mrenouf@archive:~$ sudo zpool status
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 2.42G in 0h3m with 0 errors on Thu Apr 20 08:04:09 2017
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZH6V-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0Z9EG-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZJZS-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZDDJ-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZJDQ-part2 OFFLINE 0 0 0
errors: No known data errors
旁注:为什么是 -part2?FreeNAS 就是这样做的(这个池最初就是在这里创建的)。每个驱动器在数据前面都有一个 2GB 的交换分区,我决定最好在替换驱动器上复制该分区以实现对称性。
所以无论如何,我弹出一个替代品并开始重新镀银。
zpool replace tank ata-ST4000DM005-2DP166_ZDH0ZJDQ-part2 /dev/disk/by-id/ata-ST4000DM005-2DP166_ZDH15ZE0-part2
它以创纪录的时间“完成”……(显示估计时间为 20 小时)。我没有中间步骤的历史记录,但相信我……此池中分配了 ~4TB。
pool: tank
state: ONLINE
scan: resilvered 2.42G in 0h3m with 0 errors on Thu Apr 20 08:04:09 2017
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZH6V-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0Z9EG-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZJZS-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZDDJ-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH15ZE0-part2 ONLINE 0 0 0
errors: No known data errors
我不相信你!
什么原因导致这种情况?我现在该怎么办?我还没搞清楚如何用驱动器本身替换驱动器(并强制进行另一次重新镀银)。
编辑:
我对这个新的“重新镀银”进行了清理:
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub in progress since Thu Apr 20 08:39:31 2017
12.1G scanned out of 4.29T at 87.7M/s, 14h13m to go
159M repaired, 0.27% done
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZH6V-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0Z9EG-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZJZS-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZDDJ-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH15ZE0-part2 ONLINE 0 0 20.2K (repairing)
errors: No known data errors
它似乎在~12G 标记处停滞:
12.4G scanned out of 4.29T at 64.2M/s, 19h25m to go
然后它就停止了:
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: scrub repaired 164M in 0h3m with 0 errors on Thu Apr 20 08:42:50 2017
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZH6V-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0Z9EG-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZJZS-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH0ZDDJ-part2 ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH15ZE0-part2 ONLINE 0 0 21.5K
errors: No known data errors
什么?它甚至没有扫描整个池。怎么可能?没有硬件错误,那到底是怎么回事?啊。