Linux 上的 ZFS - 设备故障后出现意外行为

Linux 上的 ZFS - 设备故障后出现意外行为

我维护一台带有 ZFS 存储池 (RAID Z3) 的 Debian 服务器。最近 ZFS 报告两个磁盘同时发生故障:

ZFS has detected that a device was removed.

 impact: Fault tolerance of the pool may be compromised.
    eid: 138
  class: statechange
  state: REMOVED
   host: serres-west-wing
   time: 2021-04-30 01:30:15+0300
  vpath: /dev/disk/by-vdev/d0-part1
  vguid: 0x6622AF6B1929E199
   pool: 0x0964CF6A3748D7A9
ZFS has detected that a device was removed.

 impact: Fault tolerance of the pool may be compromised.
    eid: 140
  class: statechange
  state: REMOVED
   host: serres-west-wing
   time: 2021-04-30 01:30:15+0300
  vpath: /dev/disk/by-vdev/d1-part1
  vguid: 0xD48BA6B066788199
   pool: 0x0964CF6A3748D7A9

生成这些消息后,热备用已激活并立即开始重新同步。重新同步后池的状态如下:

ZFS has finished a resilver:

   eid: 167
 class: resilver_finish
  host: serres-west-wing
  time: 2021-04-30 02:15:03+0300
  pool: datapool
 state: ONLINE
  scan: resilvered 132G in 00:44:41 with 0 errors on Fri Apr 30 02:15:03 2021
config:

        NAME               STATE     READ WRITE CKSUM
        datapool           ONLINE       0     0     0
          raidz2-0         ONLINE       0     0     0
            spare-0        ONLINE       0     0     0
              d0-part1     ONLINE       0     0     0
              hs-d0-part1  ONLINE       0     0     0
            d1-part1       ONLINE       0     0     0
            d2-part1       ONLINE       0     0     0
            d3-part1       ONLINE       0     0     0
            d4-part1       ONLINE       0     0     0
        logs
          mirror-1         ONLINE       0     0     0
            zil-d0-part1   ONLINE       0     0     0
            zil-d1-part1   ONLINE       0     0     0
        cache
          l2arc-d0-part2   ONLINE       0     0     0
          l2arc-d1-part2   ONLINE       0     0     0
        spares
          hs-d0-part1      INUSE     currently in use

errors: No known data errors

磁盘d0-part1似乎d1-part1已连接并且工作正常。

这是由于与磁盘降级无关的因素造成的错误吗?两个工作磁盘同时发生故障似乎不太可能。停用热备件是否安全?

答案1

磁盘断开似乎是由电源问题引起的。升级机器的 UPS 后,我没有遇到任何问题。我停用了热备件

zpool detach datapool hs-d0-part1

然后我重新银化了泳池

zpool scrud datapool

将池恢复到原始状态。

相关内容