强制故障转移到降级 ZFS 池中的热备用

2024-6-10 • tag-icon

我有一个简单的 5x1TB RAIDz1 配置（tank？池？vdev？），并为其分配了全局备用。阵列中的 5 个驱动器之一列为状态FAULTED( corrupted data)，备用驱动器列为AVAIL。该数组列出为DEGRADED.显然没有机制可以让阵列正常地故障转移到备用阵列，那么如何强制进行故障转移呢？

我读过来自许多地方的许多论坛帖子，讨论detach驱动器、replace使用备用驱动器、物理移除驱动器、将备用驱动器移动到同一插槽等。

该replace命令告诉我它无法更换驱动器，因为备用驱动器位于备用或替换配置中并尝试detach。

该detach命令告诉我它仅与镜像和 vdev 替换兼容。

没有迹象表明备用磁盘正在用于重建阵列。

我不热衷于开始物理移动驱动器，无论是当前的阵列成员还是正在运行的热备用驱动器 - 我不想中断任何事情。

我也不想关闭阵列，重新启动服务器等。系统设计为在没有此操作的情况下透明地恢复，我想了解如何操作。数据已备份，因此我可以自由支配。

Linux 内核：3.10.0-1160

ZFS 版本：5

更新：

函数的输出replace：

[root@localhost ~]# zpool replace <name> 4896358983234274072 ata-WDC_WD10EFRX-68PJCN0_WD-<serial>
cannot replace 4896358983234274072 with ata-WDC_WD10EFRX-68PJCN0_WD-<serial>: already in replacing/spare config; wait for completion or use 'zpool detach'

函数的输出detach：

[root@localhost ~]# zpool detach <name> 4896358983234274072
cannot detach 4896358983234274072: only applicable to mirror and replacing vdevs

ZFS版本：

[root@localhost ~]# zfs upgrade
This system is currently running ZFS filesystem version 5.

All filesystems are formatted with the current version.

[root@localhost ~]# modinfo zfs | grep version
version:        0.8.2-1
rhelversion:    7.9
srcversion:     29C160FF878154256C93164
vermagic:       3.10.0-1160.49.1.el7.x86_64 SMP mod_unload modversions

z池状态：

[root@localhost ~]# zpool status <name>
  pool: <name>
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h18m with 0 errors on Mon Apr  4 13:29:39 2022
config:

        NAME                                               STATE     READ WRITE CKSUM
        <name>                                                 DEGRADED     0     0     0
          raidz1-0                                         DEGRADED     0     0     0
            pci-0000:01:00.0-sas-0x443322110c000000-lun-0  ONLINE       0     0     0
            ata-WDC_WD10EFRX-68FYTN0_WD-<serial>       ONLINE       0     0     0
            pci-0000:01:00.0-sas-0x4433221109000000-lun-0  ONLINE       0     0     0
            4896358983234274072                            FAULTED      0     0     0  corrupted data
            pci-0000:01:00.0-sas-0x443322110b000000-lun-0  ONLINE       0     0     0
        spares
          ata-WDC_WD10EFRX-68PJCN0_WD-<serial>         AVAIL

更新2：

重新启动服务器可以在没有干扰或问题的情况下执行替换操作。我现在正在考虑更新 ZFS 和可能的内核，并希望确保对旧系统中构建的现有阵列进行安全操作。

相关内容