网卡中断次数过多导致 haproxy 服务器出现性能问题

2024-6-1 • tag-icon

我有一台运行 8 核处理器和 8 GB 内存的 ubuntu 18.04.1 服务器。它是一个基于 KVM 虚拟化的云服务器。我使用 haproxy 1.8.8 在我的服务器上进行负载平衡。问题是当我使用 ab 或 wrk 工具在我的服务器上运行负载测试时，我可以看到只有一个 cpu 核心被填满了 100%（core7），这是因为 si（软中断）太多，所以我检查了 /proc/interrupts 文件：

      CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         30          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          9          0          0          0          0          0          0   IO-APIC   1-edge      i8042
  6:          0          0          0          3          0          0          0          0   IO-APIC   6-edge      floppy
  8:          0          0          1          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 10:          0          0        102          0          0          0          0   24228261   IO-APIC  10-fasteoi   virtio0, eth1, eth0
 11:          0          0          0          0          0          0          0         32   IO-APIC  11-fasteoi   uhci_hcd:usb1
 12:         15          0          0          0          0          0          0          0   IO-APIC  12-edge      i8042
 14:          0          0          0          0          0          0          0          0   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0    1453248          0   IO-APIC  15-edge      ata_piix
 24:          0          0          0          0          0          0          0          0   PCI-MSI 131072-edge      virtio1-config
 25:          0          0          0          0         15          0          0          0   PCI-MSI 131073-edge      virtio1-virtqueues
 26:          0          0          0          0       5791    2805745          0          0   PCI-MSI 114688-edge      ahci[0000:00:07.0]
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:   14654751    6657243    5811366    5270649   14966993    4797078    5687129    8545399   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          1   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:    2572806    2980772    2435576    2151656    1887449    2366833    2404309    1967901   Rescheduling interrupts
CAL:     638862     508650     531191     579853     596146     636037     652622     655700   Function call interrupts
TLB:      62859      43397      20200       6237       4423      11681      18652       4408   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:       4706       4706       4706       4706       4706       4706       4706       4706   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event

这表明有很多中断从 NIC 发送，因此我尝试调整 NIC 以减少中断，

我尝试过但没有任何效果的选项：

我尝试禁用 irqbalance，
使用 smp_affinity 将 irq 10 分布在多个核心上（这根本不起作用，无论我如何更改 smp_affinity，irq 10 都只停留在一个核心上，尽管我读过一些文章说在这种情况下这不会提高性能）
将 MTU 的大小增加到 9000
将 rx 环形缓冲区大小增加至 2048
以及很多 sysctl 调整！

我还注意到我的 eth0 中存在许多 rx 错误：

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 185.8.174.227  netmask 255.255.255.0  broadcast 185.8.174.255
    inet6 fe80::84f9:91ff:fe5e:c862  prefixlen 64  scopeid 0x20<link>
    ether 86:f9:91:5e:c8:62  txqueuelen 1000  (Ethernet)
    RX packets 19862876  bytes 8071862301 (8.0 GB)
    RX errors 1746656  dropped 0  overruns 0  frame 1746656
    TX packets 22127410  bytes 13038619281 (13.0 GB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

这也是我的 haproxy.cfg 全局部分：

global
nbproc 2

#log /dev/log   local0
#log /dev/log   local1 notice
chroot /var/lib/haproxy
#stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
#stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 10000
   # tune.ssl.default-dh-param 2048


# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
#  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
#ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
#ssl-default-bind-options no-sslv3

那么我的服务器出了什么问题？任何帮助我都会接受。

相关内容