我维护一台带有 ZFS 存储池 (RAID Z3) 的 Debian 服务器。最近 ZFS 报告两个磁盘同时发生故障:
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 138
class: statechange
state: REMOVED
host: serres-west-wing
time: 2021-04-30 01:30:15+0300
vpath: /dev/disk/by-vdev/d0-part1
vguid: 0x6622AF6B1929E199
pool: 0x0964CF6A3748D7A9
ZFS has detected that a device was removed.
impact: Fault tolerance of the pool may be compromised.
eid: 140
class: statechange
state: REMOVED
host: serres-west-wing
time: 2021-04-30 01:30:15+0300
vpath: /dev/disk/by-vdev/d1-part1
vguid: 0xD48BA6B066788199
pool: 0x0964CF6A3748D7A9
生成这些消息后,热备用已激活并立即开始重新同步。重新同步后池的状态如下:
ZFS has finished a resilver:
eid: 167
class: resilver_finish
host: serres-west-wing
time: 2021-04-30 02:15:03+0300
pool: datapool
state: ONLINE
scan: resilvered 132G in 00:44:41 with 0 errors on Fri Apr 30 02:15:03 2021
config:
NAME STATE READ WRITE CKSUM
datapool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
spare-0 ONLINE 0 0 0
d0-part1 ONLINE 0 0 0
hs-d0-part1 ONLINE 0 0 0
d1-part1 ONLINE 0 0 0
d2-part1 ONLINE 0 0 0
d3-part1 ONLINE 0 0 0
d4-part1 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
zil-d0-part1 ONLINE 0 0 0
zil-d1-part1 ONLINE 0 0 0
cache
l2arc-d0-part2 ONLINE 0 0 0
l2arc-d1-part2 ONLINE 0 0 0
spares
hs-d0-part1 INUSE currently in use
errors: No known data errors
磁盘d0-part1
似乎d1-part1
已连接并且工作正常。
这是由于与磁盘降级无关的因素造成的错误吗?两个工作磁盘同时发生故障似乎不太可能。停用热备件是否安全?
答案1
磁盘断开似乎是由电源问题引起的。升级机器的 UPS 后,我没有遇到任何问题。我停用了热备件
zpool detach datapool hs-d0-part1
然后我重新银化了泳池
zpool scrud datapool
将池恢复到原始状态。