我通过运行和zpool status
用新硬盘替换了 ZFS raidz1 池中出现故障的硬盘(SMART 中的错误计数增加,读/写/校验和错误增加) 。重新同步很快就开始了,但一直出现写入错误(请参见下文)。当这些重新同步过程最终完成时,池将保持 DEGRADED 状态。如果我解决错误或重新启动,另一个重新同步过程将自动启动,但可能会出现不同数量的写入错误。zfs offline
zfs replace
zpool clear
SMART 显示此新驱动器没有错误。我还尝试了另一个新驱动器(购买了两个替代品)并交换了 SATA 电缆。总是这个 vdev 被替换(用任何一个驱动器)在重新同步期间给出写入错误。这让我怀疑 ZFS 池以某种方式受到损害,但它继续运行并且zfs send
每晚都能运行。
排除故障并解决此问题的正确方法是什么(例如,像zfs scrub
只有四个驱动器中的三个的降级池,因为zfs replace
无法完成而没有错误)?我确实有备份,zfs send/receive
希望是好的副本(ZFS 是否对收到的流进行校验?)。
pool: space
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Nov 3 09:56:59 2023
2.92T scanned at 594M/s, 2.63T issued at 536M/s, 5.90T total
669G resilvered, 44.59% done, 01:46:40 to go
config:
NAME STATE READ WRITE CKSUM
space DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-ST3000DM001-1CH166_ZZZZ1111 ONLINE 0 0 0 block size: 512B configured, 4096B native
ata-ST3000DM001-1CH166_ZZZZ2222 ONLINE 0 0 0 block size: 512B configured, 4096B native
replacing-2 UNAVAIL 0 0 0 insufficient replicas
13284017409215481231 OFFLINE 0 0 0 was /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F12QMZ-part1
ata-ST18000NM003D-3DL103_YYYY1111 FAULTED 0 4.38K 0 too many errors (resilvering)
ata-ST3000DM001-1CH166_ZZZZ4444 ONLINE 0 0 0 block size: 512B configured, 4096B native
编辑:我在以下位置看到这些错误dmesg
:
[19561.708059] sd 2:0:1:0: [sdb] tag#226 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=31s
[19561.708063] sd 2:0:1:0: [sdb] tag#226 Sense Key : Illegal Request [current]
[19561.708067] sd 2:0:1:0: [sdb] tag#226 Add. Sense: Unaligned write command
[19561.708070] sd 2:0:1:0: [sdb] tag#226 CDB: Write(16) 8a 00 00 00 00 00 68 f4 ff 40 00 00 00 53 00 00
[19561.708073] blk_update_request: I/O error, dev sdb, sector 1760886592 op 0x1:(WRITE) flags 0x700 phys_seg 83 prio class 0