我有一个当前状态的 ZFS 池:
[root@SERVER-abc ~]# zpool status -v DATAPOOL
pool: DATAPOOL
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 18.5M in 00:00:01 with 0 errors on Wed Jan 5 19:10:50 2022
config:`
NAME STATE READ WRITE CKSUM
DATAPOOL DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/14c707c6-f16c-11e8-b117-0cc47a2ba44e DEGRADED 0 0 17 too many errors
spare-1 ONLINE 0 0 17
gptid/168342c5-f16c-11e8-b117-0cc47a2ba44e ONLINE 0 0 0
gptid/1bfaa607-f16c-11e8-b117-0cc47a2ba44e ONLINE 0 0 0
gptid/1875501a-f16c-11e8-b117-0cc47a2ba44e ONLINE 0 0 30
gptid/1a16d37c-f16c-11e8-b117-0cc47a2ba44e ONLINE 0 0 29
spares
gptid/1bfaa607-f16c-11e8-b117-0cc47a2ba44e INUSE currently in use
errors: Permanent errors have been detected in the following files:
DATAPOOL/VMS/ubuntu_1804_LTS_ustrich-m6i87@auto-2022-01-04_11-41:<0x1>
<0x1080a>:<0x1>
<0x182a>:<0x1>
DATAPOOL/VMS/ubuntu_1804_LTS_ustrich-m6i87:<0x1>
<0x16fa>:<0x1>
这是具有 4 + 1 个备用驱动器的 zpool。发生了一些事情,备用驱动器突然自动与另一个驱动器配对为备用驱动器 1。
这对我来说是出乎意料的,因为:
- 为什么备用驱动器没有取代性能下降的驱动器?
- 如何找出备用块跳转到备用块-1的原因?
- 是否有可能(或甚至建议/可能)取回备用驱动器然后更换性能下降的驱动器?
目标是在无需从备份中获取大量数据的情况下挽救池,但本质上我想了解发生了什么以及为什么发生。以及如何按照“最佳实践”处理这些情况。
非常感谢!:)
系统为:SuperMicro,TrueNAS-12.0-U4.1,zfs-2.0.4-3
编辑:将输出从 zpool status -x 更改为 zpool status -v DATAPOOL
编辑2:到目前为止,我了解到第一个168342c5似乎有错误并且备用(1bfaa607)跳了进来。之后14c707c6也降级了。
编辑3,附加问题:所有驱动器(备用 1 中的驱动器除外)似乎都有 CKSUM 错误 - 这说明什么?布线?HBA?所有驱动器同时坏了?
最新更新,之后zpool clear
似乎zpool scrub DATAPOOL
很清楚,很多事情已经发生并且没有办法拯救游泳池:
pool: DATAPOOL
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jan 6 16:18:05 2022
1.82T scanned at 1.55G/s, 204G issued at 174M/s, 7.82T total
40.8G resilvered, 2.55% done, 12:44:33 to go
config:
NAME STATE READ WRITE CKSUM
DATAPOOL DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/14c707c6-f16c-11e8-b117-0cc47a2ba44e DEGRADED 0 0 156 too many errors
spare-1 DEGRADED 0 0 0
gptid/168342c5-f16c-11e8-b117-0cc47a2ba44e DEGRADED 0 0 236 too many errors
gptid/1bfaa607-f16c-11e8-b117-0cc47a2ba44e ONLINE 0 0 0 (resilvering)
gptid/1875501a-f16c-11e8-b117-0cc47a2ba44e DEGRADED 0 0 182 too many errors
gptid/1a16d37c-f16c-11e8-b117-0cc47a2ba44e DEGRADED 0 0 179 too many errors
spares
gptid/1bfaa607-f16c-11e8-b117-0cc47a2ba44e INUSE currently in use
我现在要检查所有智能统计数据。
答案1
这是一个 4 盘 RAIDZ2 吗?
您是否选择了该布局而不是 ZFS 镜像?
你能展示一下输出吗zpool status -v
?
请运行zpool clear
并跟踪结果/进度。