我们决定用类似的 12TB 驱动器系统替换我们老化的主 NAS(由三个 48 驱动器 SAS 扩展器组成,每个 4TB 驱动器),同时重复使用大约一年前添加的一些较新的硬件、一个扩展器和 SAS 卡。我们决定尽可能保持简单和便宜,同时不占用任何额外的机架空间。
新硬件、服务器和两个扩展器都已到货,并使用 Debian Buster 和 buster-backports 存储库中提供的 ZFS 进行设置。ZFS 池由两个 U.2 SSD 驱动器镜像(用于日志)、两个 U.2 SSD 驱动器(用于缓存)、4 个 HDD 备用驱动器(每个扩展器 2 个)和 12 个 RAID-Z2 磁盘阵列(每个扩展器 6 个磁盘阵列)组成。一切看起来都很好,我开始使用一个脚本将数据从旧 NAS 复制到这个 NAS,该脚本利用了增量快照、zfs 发送和 zfs 接收。
脚本的第一次运行花费了很多天,但最终还是完成了。两端都没有问题。第二次运行也运行正常。第三次运行后,发现 ZFS Pool 存在很多问题。在 4 次突袭中,大量磁盘的状态变为 UNAVAILABLE 或 FAILED,所有 4 个备用磁盘都自动投入使用。zpool status 的输出如下。
# zpool status
pool: bigvol
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jan 28 09:55:20 2021
160T scanned at 11.5G/s, 151T issued at 10.8G/s, 160T total
4.99T resilvered, 94.53% done, 0 days 00:13:46 to go
config:
NAME STATE READ WRITE CKSUM
bigvol DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
scsi-35000c500cacd481b ONLINE 0 0 0
scsi-35000c500cacceddb ONLINE 0 0 0
scsi-35000c500cacd5c4b ONLINE 0 0 0
scsi-35000c500cacd19cb ONLINE 0 0 0
scsi-35000c500cacd0f4f ONLINE 0 0 0
scsi-35000c500cacd5efb ONLINE 0 0 0
scsi-35000c500cacd133f ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
scsi-35000c500cab6617f ONLINE 0 0 0
scsi-35000c500cacd131b ONLINE 0 0 0
scsi-35000c500cacd1637 ONLINE 0 0 0
scsi-35000c500cacd0dd3 ONLINE 0 0 0
scsi-35000c500cab64247 ONLINE 0 0 0
scsi-35000c500cacd5f4b ONLINE 0 0 0
scsi-35000c500cacd206b ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
scsi-35000c500cacd251f ONLINE 0 0 0
scsi-35000c500cacf60a7 ONLINE 0 0 0
scsi-35000c500cacd55cb ONLINE 0 0 0
scsi-35000c500cacd3a5f ONLINE 0 0 0
scsi-35000c500cacd0fa7 ONLINE 0 0 0
scsi-35000c500cacd4cb3 ONLINE 0 0 0
scsi-35000c500cacd2edf ONLINE 0 0 0
raidz2-3 DEGRADED 0 0 0
scsi-35000c500cacd1627 ONLINE 0 0 0
scsi-35000c500cacd049f ONLINE 0 0 0
scsi-35000c500cacdf9d3 ONLINE 0 0 0
scsi-35000c500cab51563 DEGRADED 0 0 1 too many errors (resilvering)
scsi-35000c500cacd1c9b DEGRADED 0 0 0 too many errors
scsi-35000c500cacdf757 FAULTED 0 10 48 too many errors (resilvering)
scsi-35000c500cacd291b FAULTED 0 11 31 too many errors (resilvering)
raidz2-4 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 11
scsi-35000c500cacdb54f FAULTED 0 18 0 too many errors (resilvering)
scsi-35000c500cacdc907 DEGRADED 0 0 0 too many errors (resilvering)
scsi-35000c500cacd2c77 DEGRADED 0 0 4 too many errors
scsi-35000c500cacdbdd3 DEGRADED 0 0 12 too many errors (resilvering)
scsi-35000c500cacd0a47 DEGRADED 0 0 7 too many errors (resilvering)
scsi-35000c500cacdf107 DEGRADED 0 0 4 too many errors (resilvering)
scsi-35000c500cacd59fb DEGRADED 0 195 79 too many errors (resilvering)
scsi-35000c500cacd5307 DEGRADED 0 177 30 too many errors (resilvering)
raidz2-5 DEGRADED 0 0 0
spare-0 DEGRADED 0 0 15
scsi-35000c500cacd03a3 FAULTED 0 12 0 too many errors (resilvering)
scsi-35000c500cacd340b ONLINE 0 0 0
scsi-35000c500cacd29d7 FAULTED 0 21 24 too many errors (resilvering)
scsi-35000c500cacd23d7 DEGRADED 0 0 11 too many errors (resilvering)
scsi-35000c500cacd1c27 DEGRADED 0 0 29 too many errors (resilvering)
spare-4 DEGRADED 0 0 32
scsi-35000c500cacd26bb FAULTED 0 31 0 too many errors (resilvering)
scsi-35000c500cacd299f DEGRADED 0 0 0 too many errors (resilvering)
scsi-35000c500cacd258b DEGRADED 0 207 63 too many errors (resilvering)
spare-6 DEGRADED 0 0 24
scsi-35000c500cacdf867 FAULTED 0 15 0 too many errors (resilvering)
scsi-35000c500cacd60ef ONLINE 0 0 0
raidz2-6 DEGRADED 0 0 0
scsi-35000c500cacd2e37 ONLINE 0 0 0
scsi-35000c500cacd0ecf ONLINE 0 0 0
11839096008852004814 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-35000c500cacd1f8f-part1
scsi-35000c500cacd088b ONLINE 0 0 0
scsi-35000c500cacd28df ONLINE 0 0 0
scsi-35000c500cacd068b ONLINE 0 0 0
scsi-35000c500cacdbd77 ONLINE 0 0 0
raidz2-7 ONLINE 0 0 0
scsi-35000c500cacd040b ONLINE 0 0 0
scsi-35000c500cacd16bb ONLINE 0 0 0
scsi-35000c500cacd4d37 ONLINE 0 0 0
scsi-35000c500cacd1b57 ONLINE 0 0 0
scsi-35000c500cacd0453 ONLINE 0 0 0
scsi-35000c500cacd3f6b ONLINE 0 0 0
scsi-35000c500cacd0297 ONLINE 0 0 0
raidz2-8 ONLINE 0 0 0
scsi-35000c500cacd4bcb ONLINE 0 0 0
scsi-35000c500cacd36cf ONLINE 0 0 0
scsi-35000c500cacd1983 ONLINE 0 0 0
scsi-35000c500cacd3aaf ONLINE 0 0 0
scsi-35000c500cacda90b ONLINE 0 0 0
scsi-35000c500cacd0d53 ONLINE 0 0 0
scsi-35000c500cacdaa1f ONLINE 0 0 0
raidz2-9 ONLINE 0 0 0
scsi-35000c500cacd3f13 ONLINE 0 0 0
scsi-35000c500cacd3187 ONLINE 0 0 0
scsi-35000c500cacd59a3 ONLINE 0 0 0
scsi-35000c500cacd0913 ONLINE 0 0 0
scsi-35000c500cacdf663 ONLINE 0 0 0
scsi-35000c500cacd156b ONLINE 0 0 0
scsi-35000c500cacd203f ONLINE 0 0 0
raidz2-10 ONLINE 0 0 0
scsi-35000c500cacd4c97 ONLINE 0 0 0
scsi-35000c500cacd58a3 ONLINE 0 0 0
scsi-35000c500cacd2353 ONLINE 0 0 0
scsi-35000c500cacd3f67 ONLINE 0 0 0
scsi-35000c500cacd235f ONLINE 0 0 0
scsi-35000c500cacdf14f ONLINE 0 0 0
scsi-35000c500cacd2583 ONLINE 0 0 0
raidz2-11 ONLINE 0 0 0
scsi-35000c500cacd2f87 ONLINE 0 0 0
scsi-35000c500cacdb557 ONLINE 0 0 0
scsi-35000c500cacd00f3 ONLINE 0 0 0
scsi-35000c500cacd3ea7 ONLINE 0 0 0
scsi-35000c500cacd23ff ONLINE 0 0 0
scsi-35000c500cacd09d3 ONLINE 0 0 0
scsi-35000c500cacd3adb ONLINE 0 0 0
logs
mirror-12 ONLINE 0 0 0
nvme-eui.343842304db011100025384700000001 ONLINE 0 0 0
nvme-eui.343842304db011060025384700000001 ONLINE 0 0 0
cache
nvme-eui.343842304db010920025384700000001 ONLINE 0 0 0
nvme-eui.343842304db011080025384700000001 ONLINE 0 0 0
spares
scsi-35000c500cacdc907 INUSE currently in use
scsi-35000c500cacd299f INUSE currently in use
scsi-35000c500cacd340b INUSE currently in use
scsi-35000c500cacd60ef INUSE currently in use
errors: No known data errors
出于显而易见的原因,我已停止传输,并正在等待重新同步结束,然后再更换故障和不可用的驱动器。但是我想知道是否应该更换性能下降的驱动器?还有,是否有人知道为什么会发生这种情况?(除了一组坏驱动器的可能性之外。)或者也许我只需要关闭池并更换驱动器。无论哪种方式,我认为必须再次复制数据。
答案1
这个问题与两个 4U JBOD 机柜中的一个中的一条或两条损坏的内部 SAS 电缆有关。有问题的电缆从“主”外部 SAS 连接器连接到背板。用未使用的“次”外部连接器中的两条电缆替换它们可以解决问题。