重新启动后重新排序磁盘路径后更换 zfs 池成员

重新启动后重新排序磁盘路径后更换 zfs 池成员

我创建了一个包含三个设备的 raidz1-0 池。其中两个设备是通过/dev/disk/by-idID 添加的,不知为何我决定将其用于/dev/sdg1第三个设备。

几年后重启后,我无法让所有三台设备再次上线。以下是当前状态:

# zpool status safe00
  pool: safe00
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using 'zpool online' or replace the device with
    'zpool replace'.
  scan: scrub repaired 0 in 2h54m with 0 errors on Sun Jan 12 03:18:13 2020
config:

    NAME                                          STATE     READ WRITE CKSUM
    safe00                                        DEGRADED     0     0     0
      raidz1-0                                    DEGRADED     0     0     0
        ata-ST3500418AS_9VM89VGD                  ONLINE       0     0     0
        13759036004139463181                      OFFLINE      0     0     0  was /dev/sdg1
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF  ONLINE       0     0     0

errors: No known data errors

该机器中的驱动器为:

# lsblk -f 
NAME   FSTYPE     LABEL      UUID                                 MOUNTPOINT
sda                                                               
├─sda1 ext4       Ubuntu LTS 8a2a3c19-580a-474d-b248-bf0822cacab6 /
├─sda2 vfat                  B55A-693E                            /boot/efi
└─sda3 swap       swap       7d1cf001-07a6-4534-9624-054d70a562d5 [SWAP]
sdb    zfs_member dump       11482263899067190471                 
├─sdb1 zfs_member dump       866164895581740988                   
└─sdb9 zfs_member dump       11482263899067190471                 
sdc                                                               
sdd                                                               
├─sdd1 zfs_member dump       866164895581740988                   
└─sdd9                                                            
sde    zfs_member dump       866164895581740988                   
├─sde1 zfs_member safe00     6143939454380723991                  
└─sde2 zfs_member dump       866164895581740988                   
sdf                                                               
├─sdf1 zfs_member dump       866164895581740988                   
└─sdf9                                                            
sdg                                                               
├─sdg1 zfs_member safe00     6143939454380723991                  
└─sdg9                                                            
sdh                                                               
├─sdh1 zfs_member safe00     6143939454380723991                  
└─sdh9   

也就是说应该safe00包含三个设备:sde1sdg& sdh

只是为了得到by-id和的映射path

# cd /dev/disk/by-id
# ls -la ata* | cut -b 40- | awk '{split($0, a, " "); print a[3],a[2],a[1]}' | sort -h
../../sda1 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part1
../../sda2 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part2
../../sda3 -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN-part3
../../sda -> ata-INTEL_SSDSC2KW120H6_BTLT712507HK120GGN
../../sdb1 -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068-part1
../../sdb9 -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068-part9
../../sdb -> ata-WDC_WD20EARX-00PASB0_WD-WCAZAE573068
../../sdc -> ata-SAMSUNG_HD204UI_S2H7JD1ZA21911
../../sdd1 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553-part1
../../sdd9 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553-part9
../../sdd -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E0416553
../../sde1 -> ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
../../sde2 -> ata-ST6000VN0033-2EE110_ZAD5S9M9-part2
../../sde -> ata-ST6000VN0033-2EE110_ZAD5S9M9
../../sdf1 -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323-part1
../../sdf9 -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323-part9
../../sdf -> ata-WDC_WD10EADS-00L5B1_WD-WCAU4C151323
../../sdg1 -> ata-ST3500418AS_9VM89VGD-part1
../../sdg9 -> ata-ST3500418AS_9VM89VGD-part9
../../sdg -> ata-ST3500418AS_9VM89VGD
../../sdh1 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part1
../../sdh9 -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part9
../../sdh -> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF

和 zdb (我做了一些小注释)

# zdb -C safe00

MOS Configuration:
        version: 5000
        name: 'safe00'
        state: 0
        txg: 22826770
        pool_guid: 6143939454380723991
        errata: 0
        hostname: 'filserver'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6143939454380723991
            children[0]:
                type: 'raidz'
                id: 0
                guid: 9801294574244764778
                nparity: 1
                metaslab_array: 33
                metaslab_shift: 33
                ashift: 12
                asize: 1500281044992
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 135921832921042063
                    path: '/dev/disk/by-id/ata-ST3500418AS_9VM89VGD-part1'
                    whole_disk: 1
                    DTL: 58
                    create_txg: 4
                children[1]:         ### THIS CHILD USED TO BE sdg1
                    type: 'disk'
                    id: 1
                    guid: 13759036004139463181
                    path: '/dev/sdg1'
                    whole_disk: 0
                    not_present: 1   ### THIS IS sde1 NOW
                    DTL: 52
                    create_txg: 4
                    offline: 1
                children[2]:         ### THIS CHILD IS NOW sdg1
                    type: 'disk'
                    id: 2
                    guid: 2522190573401341943
                    path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF-part1'
                    whole_disk: 1
                    DTL: 57
                    create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
space map refcount mismatch: expected 178 != actual 177

游泳池摘要safe00

offline: sde1 --> ata-ST6000VN0033-2EE110_ZAD5S9M9-part1  <-- this likely was sdg1 before reboot
online:  sdg1 --> ata-ST3500418AS_9VM89VGD
online:  sdh1 --> ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF

尝试将离线的设备联机:

# zpool online safe00 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
cannot online ata-ST6000VN0033-2EE110_ZAD5S9M9-part1: no such device in pool
# zpool online safe00 /dev/sde1
cannot online /dev/sde1: no such device in pool

我也尝试用真实设备替换离线设备:

# zpool replace safe00 13759036004139463181 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-ST6000VN0033-2EE110_ZAD5S9M9-part1 is part of active pool 'safe00'
# zpool replace safe00 /dev/sdg1 ata-ST6000VN0033-2EE110_ZAD5S9M9-part1
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-ST6000VN0033-2EE110_ZAD5S9M9-part1 is part of active pool 'safe00'

因此,我最终尝试使用其 ID 使丢失的设备上线:

# zpool online safe00 13759036004139463181
warning: device '13759036004139463181' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

令人高兴的是,这将磁盘置于故障状态并开始修复。

# zpool status safe00
  pool: safe00
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Sun Feb 23 11:19:00 2020
    14.3G scanned out of 1.09T at 104M/s, 3h0m to go
    0 repaired, 1.29% done
config:

    NAME                                          STATE     READ WRITE CKSUM
    safe00                                        DEGRADED     0     0     0
      raidz1-0                                    DEGRADED     0     0     0
        ata-ST3500418AS_9VM89VGD                  ONLINE       0     0     0
        13759036004139463181                      FAULTED      0     0     0  was /dev/sdg1
        ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1NYTHJF  ONLINE       0     0     0

errors: No known data errors

我应该怎么做才能避免这种情况再次发生——如何更改 zdb 中设备的“路径”属性,以使其不依赖于 Linux 在启动时对磁盘的枚举?

答案1

最可靠的方法可能是使用 GUID 或 GPT 标签创建池,我个人认为 GPT 标签是更好的解决方案,正如2021 年为 ZFS 池指定磁盘 (vdev) 的最佳实践

data-1-sces3-3tb-Z1Y0P0DK

<pool>-<pool-id>-<disk-vendor-and-model-name>-<size-of-disk>-<disk-serial-number>

以这种方式命名将有助于您解决以下问题:

  1. 轻松了解定义池的拓扑。
  2. 轻松找到所用驱动器的供应商名称和型号名称。
  3. 轻松找到磁盘容量。
  4. 当您在 GPT 标签内包含驱动器上打印的序列号时,可以轻松识别并找到驱动器笼中的坏磁盘。

存在其他持久的方法来识别磁盘,例如使用某种 ID,但它本身不够直观,您无法仅根据其电子 ID 轻松找到磁盘,您必须自己将 ID 链接到其物理位置。

我还发现如果你想重新映射池中的磁盘,这可能会有所帮助zpool 状态中混合了 gptid 和 dev 名称

# zpool import -d /dev/gptid tank

相关内容