ZFS 和 SAN:数据清理问题

ZFS 和 SAN:数据清理问题

作为企业环境中的科学家,我们获得了 Ubuntu 20.04 虚拟机 (Proxmox) 中的 SAN 存储资源。SAN 控制器直接传递到 VM(PCIe 直通)。

SAN 本身使用硬件 Raid 60(没有其他选择),并为我们提供了 380 TB,我们可以将其分成多个 LUN。我们希望从 ZFS 压缩和快照功能中获益。我们选择了 30 x 11 TB LUN,然后将其组织为条带化 RAID-Z。该设置是冗余的(两台服务器),我们有备份,并且性能良好,这使我们倾向于使用条带 RAID-Z,而不是通常的条带镜像。

与 ZFS 几何无关,我们注意到 ZFS 清理期间的高写入负载(> 1 GB/s)会导致磁盘错误,最终导致设备故障。通过查看出现错误的文件,我们可以将此问题与清理过程联系起来,清理过程试图访问仍然存在于 SAN 缓存中的数据。清理期间的负载适中,该过程完成时没有任何错误。

是否有针对 ZFS 或多路径的配置参数可以在 VM 内进行调整以防止出现 SAN 缓存问题?

zpool status 的输出

  pool: sanpool
 state: ONLINE
  scan: scrub repaired 0B in 2 days 02:05:53 with 0 errors on Thu Mar 17 15:50:34 2022
config:

    NAME                                        STATE     READ WRITE CKSUM
    sanpool                                     ONLINE       0     0     0
      raidz1-0                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000002e  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000002f  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000031  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000032  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000033  ONLINE       0     0     0
      raidz1-1                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000034  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000035  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000036  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000037  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000038  ONLINE       0     0     0
      raidz1-2                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000062  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000063  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000064  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000065  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000066  ONLINE       0     0     0
      raidz1-3                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006a  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006b  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006c  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006d  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000006f  ONLINE       0     0     0
      raidz1-4                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000070  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000071  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000072  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000073  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000074  ONLINE       0     0     0
      raidz1-5                                  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000075  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000076  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000077  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b00300000079  ONLINE       0     0     0
        wwn-0x60060e8012b003005040b0030000007a  ONLINE       0     0     0

errors: No known data errors

multipath -ll 的输出

mpathr (360060e8012b003005040b00300000074) dm-18 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:25 sdz  65:144 active ready running
  `- 8:0:0:25 sdbd 67:112 active ready running
mpathe (360060e8012b003005040b00300000064) dm-5 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:13 sdn  8:208  active ready running
  `- 8:0:0:13 sdar 66:176 active ready running
mpathq (360060e8012b003005040b00300000073) dm-17 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:24 sdy  65:128 active ready running
  `- 8:0:0:24 sdbc 67:96  active ready running
mpathd (360060e8012b003005040b00300000063) dm-4 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:12 sdm  8:192  active ready running
  `- 8:0:0:12 sdaq 66:160 active ready running
mpathp (360060e8012b003005040b00300000072) dm-16 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:23 sdx  65:112 active ready running
  `- 8:0:0:23 sdbb 67:80  active ready running
mpathc (360060e8012b003005040b00300000062) dm-3 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:11 sdl  8:176  active ready running
  `- 8:0:0:11 sdap 66:144 active ready running
mpatho (360060e8012b003005040b00300000071) dm-15 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:22 sdw  65:96  active ready running
  `- 8:0:0:22 sdba 67:64  active ready running
mpathb (360060e8012b003005040b00300000038) dm-2 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:10 sdk  8:160  active ready running
  `- 8:0:0:10 sdao 66:128 active ready running
mpathn (360060e8012b003005040b00300000070) dm-14 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:21 sdv  65:80  active ready running
  `- 8:0:0:21 sdaz 67:48  active ready running
mpatha (360060e8012b003005040b0030000002e) dm-1 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:1  sdb  8:16   active ready running
  `- 8:0:0:1  sdaf 65:240 active ready running
mpathz (360060e8012b003005040b00300000033) dm-26 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:5  sdf  8:80   active ready running
  `- 8:0:0:5  sdaj 66:48  active ready running
mpathm (360060e8012b003005040b0030000006f) dm-13 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:20 sdu  65:64  active ready running
  `- 8:0:0:20 sday 67:32  active ready running
mpathy (360060e8012b003005040b00300000032) dm-25 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:4  sde  8:64   active ready running
  `- 8:0:0:4  sdai 66:32  active ready running
mpathl (360060e8012b003005040b0030000002f) dm-12 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:2  sdc  8:32   active ready running
  `- 8:0:0:2  sdag 66:0   active ready running
mpathx (360060e8012b003005040b0030000007a) dm-24 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:30 sdae 65:224 active ready running
  `- 8:0:0:30 sdbi 67:192 active ready running
mpathad (360060e8012b003005040b00300000037) dm-30 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:9  sdj  8:144  active ready running
  `- 8:0:0:9  sdan 66:112 active ready running
mpathk (360060e8012b003005040b0030000006d) dm-11 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:19 sdt  65:48  active ready running
  `- 8:0:0:19 sdax 67:16  active ready running
mpathw (360060e8012b003005040b00300000031) dm-23 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:3  sdd  8:48   active ready running
  `- 8:0:0:3  sdah 66:16  active ready running
mpathac (360060e8012b003005040b00300000036) dm-29 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:8  sdi  8:128  active ready running
  `- 8:0:0:8  sdam 66:96  active ready running
mpathj (360060e8012b003005040b0030000006c) dm-10 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:18 sds  65:32  active ready running
  `- 8:0:0:18 sdaw 67:0   active ready running
mpathv (360060e8012b003005040b00300000079) dm-22 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:29 sdad 65:208 active ready running
  `- 8:0:0:29 sdbh 67:176 active ready running
mpathab (360060e8012b003005040b00300000035) dm-28 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:7  sdh  8:112  active ready running
  `- 8:0:0:7  sdal 66:80  active ready running
mpathi (360060e8012b003005040b0030000006b) dm-9 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:17 sdr  65:16  active ready running
  `- 8:0:0:17 sdav 66:240 active ready running
mpathu (360060e8012b003005040b00300000077) dm-21 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:28 sdac 65:192 active ready running
  `- 8:0:0:28 sdbg 67:160 active ready running
mpathaa (360060e8012b003005040b00300000034) dm-27 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:6  sdg  8:96   active ready running
  `- 8:0:0:6  sdak 66:64  active ready running
mpathh (360060e8012b003005040b0030000006a) dm-8 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:16 sdq  65:0   active ready running
  `- 8:0:0:16 sdau 66:224 active ready running
mpatht (360060e8012b003005040b00300000076) dm-20 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:27 sdab 65:176 active ready running
  `- 8:0:0:27 sdbf 67:144 active ready running
mpathg (360060e8012b003005040b00300000066) dm-7 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:15 sdp  8:240  active ready running
  `- 8:0:0:15 sdat 66:208 active ready running
mpaths (360060e8012b003005040b00300000075) dm-19 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:26 sdaa 65:160 active ready running
  `- 8:0:0:26 sdbe 67:128 active ready running
mpathf (360060e8012b003005040b00300000065) dm-6 HITACHI,OPEN-V
size=11T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 7:0:0:14 sdo  8:224  active ready running
  `- 8:0:0:14 sdas 66:192 active ready running

答案1

你看错了地方。如果你 SAN故障负载下,那么您就不能依赖它了,就这样。修复 SAN。

答案2

我们能够修复该设置。

  1. 我们设置了正确的选项缓存文件=无以避免在启动时导入具有不稳定多路径配置的 zpool。我们注意到,在 zfs 挂载池之前,某些冗余路径未完全设置。延迟导入可防止 zfs 在负载下发生故障设备的级联,并允许单独查看 SAN 的潜在问题。

  2. 我们在系统和 SAN 日志中发现了 IO 错误,这些错误偶尔只影响了一半的 mpath。我们首先更换了与错误相关的电缆,但没有效果:光纤连接器似乎是罪魁祸首,因此被更换了。

  3. 我们找到了 SAN 供应商推荐的最佳多路径参数并编辑了相应的文件:

纳米/etc/multipath.conf

最后我们更新了初始 RAM 磁盘:

更新-initramfs -u -k 全部

所有描述的负载问题现已解决,多路径-ll在清理期间不再显示任何失败的路径,并且 zfs 停止报告错误。

答案3

这确实属于专业的服务,考虑到设置的特殊性和奇怪的 SAN 配置。

这可以进行调整并获得更好的行为和性能。

  • 但是你为什么要擦洗呢?
  • 您调整了哪些可调参数来清理此几何池?
  • 请发布您的/etc/modprobe.d/zfs.conf
  • 请发布你的 Proxmox/etc/sysctl.conf

相关内容