作为企业环境中的科学家,我们获得了 Ubuntu 20.04 虚拟机 (Proxmox) 中的 SAN 存储资源。SAN 控制器直接传递到 VM(PCIe 直通)。
SAN 本身使用硬件 Raid 60(没有其他选择),并为我们提供了 380 TB,我们可以将其分成多个 LUN。我们希望从 ZFS 压缩和快照功能中获益。我们选择了 30 x 11 TB LUN,然后将其组织为条带化 RAID-Z。该设置是冗余的(两台服务器),我们有备份,并且性能良好,这使我们倾向于使用条带 RAID-Z,而不是通常的条带镜像。
与 ZFS 几何无关,我们注意到 ZFS 清理期间的高写入负载(> 1 GB/s)会导致磁盘错误,最终导致设备故障。通过查看出现错误的文件,我们可以将此问题与清理过程联系起来,清理过程试图访问仍然存在于 SAN 缓存中的数据。清理期间的负载适中,该过程完成时没有任何错误。
是否有针对 ZFS 或多路径的配置参数可以在 VM 内进行调整以防止出现 SAN 缓存问题?
zpool status 的输出
pool: sanpool
state: ONLINE
scan: scrub repaired 0B in 2 days 02:05:53 with 0 errors on Thu Mar 17 15:50:34 2022
config:
NAME STATE READ WRITE CKSUM
sanpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000002e ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000002f ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000031 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000032 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000033 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000034 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000035 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000036 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000037 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000038 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000062 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000063 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000064 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000065 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000066 ONLINE 0 0 0
raidz1-3 ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000006a ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000006b ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000006c ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000006d ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000006f ONLINE 0 0 0
raidz1-4 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000070 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000071 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000072 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000073 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000074 ONLINE 0 0 0
raidz1-5 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000075 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000076 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000077 ONLINE 0 0 0
wwn-0x60060e8012b003005040b00300000079 ONLINE 0 0 0
wwn-0x60060e8012b003005040b0030000007a ONLINE 0 0 0
errors: No known data errors
multipath -ll 的输出
mpathr (360060e8012b003005040b00300000074) dm-18 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:25 sdz 65:144 active ready running
`- 8:0:0:25 sdbd 67:112 active ready running
mpathe (360060e8012b003005040b00300000064) dm-5 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:13 sdn 8:208 active ready running
`- 8:0:0:13 sdar 66:176 active ready running
mpathq (360060e8012b003005040b00300000073) dm-17 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:24 sdy 65:128 active ready running
`- 8:0:0:24 sdbc 67:96 active ready running
mpathd (360060e8012b003005040b00300000063) dm-4 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:12 sdm 8:192 active ready running
`- 8:0:0:12 sdaq 66:160 active ready running
mpathp (360060e8012b003005040b00300000072) dm-16 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:23 sdx 65:112 active ready running
`- 8:0:0:23 sdbb 67:80 active ready running
mpathc (360060e8012b003005040b00300000062) dm-3 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:11 sdl 8:176 active ready running
`- 8:0:0:11 sdap 66:144 active ready running
mpatho (360060e8012b003005040b00300000071) dm-15 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:22 sdw 65:96 active ready running
`- 8:0:0:22 sdba 67:64 active ready running
mpathb (360060e8012b003005040b00300000038) dm-2 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:10 sdk 8:160 active ready running
`- 8:0:0:10 sdao 66:128 active ready running
mpathn (360060e8012b003005040b00300000070) dm-14 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:21 sdv 65:80 active ready running
`- 8:0:0:21 sdaz 67:48 active ready running
mpatha (360060e8012b003005040b0030000002e) dm-1 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:1 sdb 8:16 active ready running
`- 8:0:0:1 sdaf 65:240 active ready running
mpathz (360060e8012b003005040b00300000033) dm-26 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:5 sdf 8:80 active ready running
`- 8:0:0:5 sdaj 66:48 active ready running
mpathm (360060e8012b003005040b0030000006f) dm-13 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:20 sdu 65:64 active ready running
`- 8:0:0:20 sday 67:32 active ready running
mpathy (360060e8012b003005040b00300000032) dm-25 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:4 sde 8:64 active ready running
`- 8:0:0:4 sdai 66:32 active ready running
mpathl (360060e8012b003005040b0030000002f) dm-12 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:2 sdc 8:32 active ready running
`- 8:0:0:2 sdag 66:0 active ready running
mpathx (360060e8012b003005040b0030000007a) dm-24 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:30 sdae 65:224 active ready running
`- 8:0:0:30 sdbi 67:192 active ready running
mpathad (360060e8012b003005040b00300000037) dm-30 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:9 sdj 8:144 active ready running
`- 8:0:0:9 sdan 66:112 active ready running
mpathk (360060e8012b003005040b0030000006d) dm-11 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:19 sdt 65:48 active ready running
`- 8:0:0:19 sdax 67:16 active ready running
mpathw (360060e8012b003005040b00300000031) dm-23 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:3 sdd 8:48 active ready running
`- 8:0:0:3 sdah 66:16 active ready running
mpathac (360060e8012b003005040b00300000036) dm-29 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:8 sdi 8:128 active ready running
`- 8:0:0:8 sdam 66:96 active ready running
mpathj (360060e8012b003005040b0030000006c) dm-10 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:18 sds 65:32 active ready running
`- 8:0:0:18 sdaw 67:0 active ready running
mpathv (360060e8012b003005040b00300000079) dm-22 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:29 sdad 65:208 active ready running
`- 8:0:0:29 sdbh 67:176 active ready running
mpathab (360060e8012b003005040b00300000035) dm-28 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:7 sdh 8:112 active ready running
`- 8:0:0:7 sdal 66:80 active ready running
mpathi (360060e8012b003005040b0030000006b) dm-9 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:17 sdr 65:16 active ready running
`- 8:0:0:17 sdav 66:240 active ready running
mpathu (360060e8012b003005040b00300000077) dm-21 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:28 sdac 65:192 active ready running
`- 8:0:0:28 sdbg 67:160 active ready running
mpathaa (360060e8012b003005040b00300000034) dm-27 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:6 sdg 8:96 active ready running
`- 8:0:0:6 sdak 66:64 active ready running
mpathh (360060e8012b003005040b0030000006a) dm-8 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:16 sdq 65:0 active ready running
`- 8:0:0:16 sdau 66:224 active ready running
mpatht (360060e8012b003005040b00300000076) dm-20 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:27 sdab 65:176 active ready running
`- 8:0:0:27 sdbf 67:144 active ready running
mpathg (360060e8012b003005040b00300000066) dm-7 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:15 sdp 8:240 active ready running
`- 8:0:0:15 sdat 66:208 active ready running
mpaths (360060e8012b003005040b00300000075) dm-19 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:26 sdaa 65:160 active ready running
`- 8:0:0:26 sdbe 67:128 active ready running
mpathf (360060e8012b003005040b00300000065) dm-6 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
|- 7:0:0:14 sdo 8:224 active ready running
`- 8:0:0:14 sdas 66:192 active ready running
答案1
你看错了地方。如果你 SAN故障负载下,那么您就不能依赖它了,就这样。修复 SAN。
答案2
我们能够修复该设置。
我们设置了正确的选项缓存文件=无以避免在启动时导入具有不稳定多路径配置的 zpool。我们注意到,在 zfs 挂载池之前,某些冗余路径未完全设置。延迟导入可防止 zfs 在负载下发生故障设备的级联,并允许单独查看 SAN 的潜在问题。
我们在系统和 SAN 日志中发现了 IO 错误,这些错误偶尔只影响了一半的 mpath。我们首先更换了与错误相关的电缆,但没有效果:光纤连接器似乎是罪魁祸首,因此被更换了。
我们找到了 SAN 供应商推荐的最佳多路径参数并编辑了相应的文件:
纳米/etc/multipath.conf
最后我们更新了初始 RAM 磁盘:
更新-initramfs -u -k 全部
所有描述的负载问题现已解决,多路径-ll在清理期间不再显示任何失败的路径,并且 zfs 停止报告错误。
答案3
这确实属于专业的服务,考虑到设置的特殊性和奇怪的 SAN 配置。
这可以进行调整并获得更好的行为和性能。
- 但是你为什么要擦洗呢?
- 您调整了哪些可调参数来清理此几何池?
- 请发布您的
/etc/modprobe.d/zfs.conf
- 请发布你的 Proxmox
/etc/sysctl.conf