我有一个带有 3 个 HDD 的 3 磁盘 RAIDZ1 阵列:
# zpool status
...
config:
NAME STATE READ WRITE CKSUM
gpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
(当我创建池时我使用了/dev/disk/by-id
路径但它们显示为/dev/sdx
)。
我想将所有 3 个 HDD 都换成 SSD,但要逐步进行。由于我有 6 个 SATA 插槽和一根额外的电缆,我首先插入了一个新的 SSD 并进行设置,使用要替换的磁盘作为源:
# sgdisk --replicate=/dev/disk/by-id/newSSD1 /dev/disk/by-id/oldHDD1
The operation has completed successfully.
# sgdisk --randomize-guids /dev/disk/by-id/newSSD1
The operation has completed successfully.
# grub-install /dev/disk/by-id/newSSD1
Installing for i386-pc platform.
Installation finished. No error reported.
然后fdisk -l /dev/disk/by-id/newSSD1
向我展示了分区与 3 个硬盘相同,这意味着:
Disk /dev/disk/by-id/newSSD1: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000MX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: EF97564D-490F-4A76-B0F0-4E8C7CAFFBD2
Device Start End Sectors Size Type
/dev/disk/by-id/newSSD1-part1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/disk/by-id/newSSD1-part2 48 2047 2000 1000K BIOS boot
/dev/disk/by-id/newSSD1-part9 1953507328 1953523711 16384 8M Solaris reserved 1
Partition table entries are not in disk order.
然后我继续更换磁盘:
# zpool offline gpool /dev/sdb
# zpool status
pool: gpool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0B in 0 days 00:30:46 with 0 errors on Sat Jun 27 12:29:56 2020
config:
NAME STATE READ WRITE CKSUM
gpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb OFFLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
errors: No known data errors
# zpool replace gpool /dev/sdb /dev/disk/by-id/newSSD1
Make sure to wait until resilver is done before rebooting.
# zpool status
pool: gpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jul 16 20:00:58 2020
427G scanned at 6.67G/s, 792M issued at 12.4M/s, 574G total
0B resilvered, 0.13% done, 0 days 13:10:03 to go
config:
NAME STATE READ WRITE CKSUM
gpool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
sdb OFFLINE 0 0 0
ata-newSSD1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
errors: No known data errors
最终,它恢复了银色。
# zpool status
pool: gpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: resilvered 192G in 0 days 00:27:48 with 0 errors on Thu Jul 16 20:28:46 2020
config:
NAME STATE READ WRITE CKSUM
gpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-SSD1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
errors: No known data errors
这次带有by-id
标签。由于我复制了分区并在新 SSD 上安装了 GRUB,所以我没想到会有什么麻烦。
但是,当我启动时,GRUB 让我进入grub rescue>
提示符,因为grub_file_filters not found
。我尝试从其他 2 个硬盘和 SSD 启动,每次都出现相同的错误。重新插入第 3 个硬盘,结果相同。
今天我从 SSD 启动了……一切正常。zpool 一切正常,没有 grub 错误。我正在这个系统上写这篇文章。
ls
在救援提示符上确实显示了预期的一堆分区,但我只能让 GRUB 在一次 i insmod zfs
(或类似) 时显示有意义的信息。但是,尝试ls
类似操作(hd0,gpt1)/ROOT/gentoo@/boot
会导致compression algorithm 73 not supported
(或 80 也)。
我运行的是内核 5.4.28,附带 initramfs 和root=ZFS
GRUB 参数。在我决定更换驱动器之前,我没有遇到任何与 ZFS 根启动相关的事件。我的/etc/default/grub
有条目可以找到 ZFS 根,
GRUB_CMDLINE_LINUX_DEFAULT="dozfs spl.spl_hostid=0xa8c06101 real_root=ZFS=gpool/ROOT/gentoo"
确实如此。我想继续更换其他磁盘,但我更想知道发生了什么以及如何避免这种情况。
编辑1
我注意到了一些东西。运行后sgdisk --replicate
,我得到了 3 个分区,与原始磁盘相同:
# fdisk -l ${NEWDISK2}
Disk /dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000MX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2190C74D-46C8-44AC-81FB-36C3B72A7EA7
Device Start End Sectors Size Type
/dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2-part1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2-part2 48 2047 2000 1000K BIOS boot
/dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2-part9 1953507328 1953523711 16384 8M Solaris reserved 1
Partition table entries are not in disk order.
...但运行后zpool replace
我丢失了一个分区:
# fdisk -l ${NEWDISK}
Disk /dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2: 931.53 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: CT1000MX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 0FC0A6C0-F9F1-E341-B7BD-99D7B370D685
Device Start End Sectors Size Type
/dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2-part1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/disk/by-id/ata-CT1000MX500SSD1_NEWDISK2-part9 1953507328 1953523711 16384 8M Solaris reserved 1
...启动分区。考虑到我设法从新的 SSD 启动,这很奇怪。
我将继续进行实验。至于 ZFS 版本:
# zpool version
zfs-0.8.4-r1-gentoo
zfs-kmod-0.8.3-r0-gentoo
编辑2
这是一致的。当我复制时,sgdisk --replicate
我得到了 3 个分区作为它们的原始分区,包括 BIOS 启动分区。运行zpool replace
和重新同步后,我丢失了启动分区。
我认为系统仍可启动,因为该分区的数据仍在 MBR 中,所以 BIOS 可以启动 GRUB。
目前的状态如下:
# zpool status
pool: gpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: resilvered 192G in 0 days 00:08:04 with 0 errors on Fri Jul 17 21:04:54 2020
config:
NAME STATE READ WRITE CKSUM
gpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-CT1000MX500SSD1_NEWSSD1 ONLINE 0 0 0
ata-CT1000MX500SSD1_NEWSSD2 ONLINE 0 0 0
ata-CT1000MX500SSD1_NEWSSD3 ONLINE 0 0 0
errors: No known data errors