我有一台 Ubuntu 服务器,其中设置了一个 14 个磁盘的 ZFS raidz2 池。
大约 80% 的时间里,在重新启动时,我最终会得到一个降级池,其中两个磁盘被标记为故障。故障的驱动器并不总是相同的,但总是恰好是两个驱动器。例如:
$ sudo zpool status
pool: tank
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sde ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
sdh ONLINE 0 0 0
11521322863231878081 FAULTED 0 0 0 was /dev/sdf1
15273938560620494453 FAULTED 0 0 0 was /dev/sdg1
sdb ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
errors: No known data errors
我可以导出并重新导入池,磁盘不再出现故障。例如:
$ sudo zpool export tank
$ sudo zpool import tank
$ sudo zpool status
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 4K in 0h0m with 0 errors on Sun Sep 30 23:08:51 2018
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sde ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sda ONLINE 0 0 0
sdh ONLINE 0 0 0
sdg ONLINE 0 0 1
sdf ONLINE 0 0 0
sdb ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
errors: No known data errors
正在使用的 HBA 在另一台服务器上运行正常。
我还有什么办法可以避免重启时出现这些故障驱动器吗?我有另一个可以交换的 HBA。
答案1
您不应该使用 /dev/sdX 名称作为池配置。
SCSI 枚举中的任何变化(例如插入 CDROM 或 USB 驱动器)都可能导致设备名称发生变化,从而导致您遇到的错误。
您可以选择使用 /dev/disk/by-id 名称。
用和来做这zpool export tank
件事zpool import -d /dev/disk/by-id tank