XCP-NG 10Gbit 专用存储网络速度慢(其工作方式与 1Gbit 相同)

XCP-NG 10Gbit 专用存储网络速度慢(其工作方式与 1Gbit 相同)

搜索了一段时间,找不到答案,甚至找不到前进的方向。

所以。XCP-NG 集群由三台服务器组成:HP DL360p G8、MSA 2060 iSCSI NAS(配备 12 个 SAS 10K 驱动器)、QNAP TS-1273U-RP、Mikrotik CRS317 交换机。存储网络位于 mikrotik 的专用网桥中。所有设备都通过 3 米长的铜缆连接。所有设备都显示链路为 10G。我甚至将所有设备的 MTU 配置为 9000。每台服务器都有带两个接口的以太网卡。一个仅用于存储网络(三台服务器上的 eth1)。存储网络和管理网络的子网不同。Xen 网络后端是 openvswitch。

巨型帧正在运行:

ping -M do -s 8972 -c 2 10.100.200.10 -- QNAP
PING 10.100.200.10 (10.100.200.10) 8972(9000) bytes of data.
8980 bytes from 10.100.200.10: icmp_seq=1 ttl=64 time=1.01 ms
8980 bytes from 10.100.200.10: icmp_seq=2 ttl=64 time=0.349 ms

--- 10.100.200.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.349/0.682/1.015/0.333 ms

ping -M do -s 8972 -c 2 10.100.200.8 -- MSA 2060
PING 10.100.200.8 (10.100.200.8) 8972(9000) bytes of data.
8980 bytes from 10.100.200.8: icmp_seq=1 ttl=64 time=9.83 ms
8980 bytes from 10.100.200.8: icmp_seq=2 ttl=64 time=0.215 ms

--- 10.100.200.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.215/5.023/9.832/4.809 ms

问题:当我将虚拟机从一个存储(QNAP)复制到另一个存储(MSA)时,写入速度约为 45MB/s。当我将一些大文件从 QNAP(例如:iso 安装)复制到服务器本地存储时,速度约为 100MB/s,并且该服务器htop显示一个核心负载为 100%

可以清楚的看到,网络的运行速度就像1G网络一样。

一些有关硬件的信息。

ethtool -i eth1
driver: ixgbe
version: 5.5.2
firmware-version: 0x18b30001
expansion-rom-version:
bus-info: 0000:07:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   10000baseT/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  10000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

lspci | grep net
07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

然后我在此主机上运行 iper3 服务器:iperf3 -s -4 服务器主机上的结果:

[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec  5.48 GBytes  4.69 Gbits/sec                  receiver
[  7]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  7]   0.00-10.04  sec  5.44 GBytes  4.66 Gbits/sec                  receiver
[SUM]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[SUM]   0.00-10.04  sec  10.9 GBytes  9.35 Gbits/sec                  receiver

另一台主机上的客户端:iperf3 -c 10.100.200.20 -P 2 -t 10 -4 客户端主机上的结果:

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.49 GBytes  4.72 Gbits/sec  112             sender
[  4]   0.00-10.00  sec  5.48 GBytes  4.71 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  5.45 GBytes  4.68 Gbits/sec  178             sender
[  6]   0.00-10.00  sec  5.44 GBytes  4.67 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  10.9 GBytes  9.40 Gbits/sec  290             sender
[SUM]   0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec                  receiver

下一步要测试什么或如何找到瓶颈?

iperf3 显示链接以 10Gbit 的速度运行,还是我对结果的解释不正确?

软件版本:

xe host-list params=software-version
software-version (MRO)    : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602


software-version (MRO)    : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602


software-version (MRO)    : product_version: 8.2.0; product_version_text: 8.2; product_version_text_short: 8.2; platform_name: XCP; platform_version: 3.2.0; product_brand: XCP-ng; build_number: release/stockholm/master/7; hostname: localhost; date: 2021-05-20; dbv: 0.0.1; xapi: 1.20; xen: 4.13.1-9.11.1; linux: 4.19.0+1; xencenter_min: 2.16; xencenter_max: 2.16; network_backend: openvswitch; db_schema: 5.602

另外两台服务器有 HP 530FLR-SFP+ 卡:

    lspci | grep net
    03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
    03:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

ethtool -i eth1
driver: bnx2x
version: 1.714.24 storm 7.13.11.0
firmware-version: bc 7.10.10
expansion-rom-version:
bus-info: 0000:03:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

ethtool eth1
Settings for eth1:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full
                                10000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  10000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 10000Mb/s
        Duplex: Full
        Port: Direct Attach Copper
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: off
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000000 (0)

        Link detected: yes

编辑1:本地存储测试:

dmesg | grep sda
[   13.093002] sd 0:1:0:0: [sda] 860051248 512-byte logical blocks: (440 GB/410 GiB)
[   13.093077] sd 0:1:0:0: [sda] Write Protect is off
[   13.093080] sd 0:1:0:0: [sda] Mode Sense: 73 00 00 08
[   13.093232] sd 0:1:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   13.112781]  sda: sda1 sda2 sda3 sda4 sda5 sda6
[   13.114348] sd 0:1:0:0: [sda] Attached SCSI disk
[   15.267456] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
[   15.268750] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[   17.597243] EXT4-fs (sda1): re-mounted. Opts: (null)
[   18.991998] Adding 1048572k swap on /dev/sda6.  Priority:-2 extents:1 across:1048572k
[   19.279706] EXT4-fs (sda5): mounting ext3 file system using the ext4 subsystem
[   19.281346] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

dd if=/dev/sda of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.1072 s, 92.2 MB/s

这很奇怪,因为服务器有带 2GB 缓存的智能阵列 P420i 控制器,6 个 146GB 15k SAS 驱动器的硬件 raid10。iLo 显示存储一切正常。在另一台服务器上,结果类似1024000000 bytes (1.0 GB) copied, 11.8031 s, 86.8 MB/s

编辑2(共享存储测试):
Qnap(SSD Raid10):

dd if=/run/sr-mount/23d45731-c005-8ad6-a596-bab2d12ec6b5/01ce9f2e-c5b1-4ba8-b783-d3a5c1ac54f0.vhd of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.2902 s, 90.7 MB/s

MSA(HP MSA-DP + 突袭):

dd if=/dev/mapper/3600c0ff000647bc2259a2f6101000000 of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 11.3974 s, 89.8 MB/s

网络不超过 1 千兆位...因此,如果我在共享存储之间传输 VM 映像,则不涉及本地存储。openvswitch 会成为瓶颈吗?

编辑 3(更多磁盘测试):
sda = 6 x 146GB 15k sas 的 Raid10,sdb = raid0 中的一个 146GB 15k SAS

dd if=/dev/sdb of=/dev/null bs=1024 count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 16.5326 s, 61.9 MB/s
[14:35 xcp-ng-em ssh]# dd if=/dev/sdb of=/dev/null bs=512k count=1000
1000+0 records in
1000+0 records out
524288000 bytes (524 MB) copied, 8.48061 s, 61.8 MB/s
[14:36 xcp-ng-em ssh]# dd if=/dev/sdb of=/dev/null bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 84.9631 s, 61.7 MB/s
[14:37 xcp-ng-em ssh]# dd if=/dev/sda of=/dev/null bs=512k count=10000
10000+0 records in
10000+0 records out
5242880000 bytes (5.2 GB) copied, 7.03023 s, 746 MB/s

相关内容