我有一台运行 8 核处理器和 8 GB 内存的 ubuntu 18.04.1 服务器。它是一个基于 KVM 虚拟化的云服务器。我使用 haproxy 1.8.8 在我的服务器上进行负载平衡。问题是当我使用 ab 或 wrk 工具在我的服务器上运行负载测试时,我可以看到只有一个 cpu 核心被填满了 100%(core7),这是因为 si(软中断)太多,所以我检查了 /proc/interrupts 文件:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 30 0 0 0 0 0 0 0 IO-APIC 2-edge timer
1: 0 9 0 0 0 0 0 0 IO-APIC 1-edge i8042
6: 0 0 0 3 0 0 0 0 IO-APIC 6-edge floppy
8: 0 0 1 0 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
10: 0 0 102 0 0 0 0 24228261 IO-APIC 10-fasteoi virtio0, eth1, eth0
11: 0 0 0 0 0 0 0 32 IO-APIC 11-fasteoi uhci_hcd:usb1
12: 15 0 0 0 0 0 0 0 IO-APIC 12-edge i8042
14: 0 0 0 0 0 0 0 0 IO-APIC 14-edge ata_piix
15: 0 0 0 0 0 0 1453248 0 IO-APIC 15-edge ata_piix
24: 0 0 0 0 0 0 0 0 PCI-MSI 131072-edge virtio1-config
25: 0 0 0 0 15 0 0 0 PCI-MSI 131073-edge virtio1-virtqueues
26: 0 0 0 0 5791 2805745 0 0 PCI-MSI 114688-edge ahci[0000:00:07.0]
NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 14654751 6657243 5811366 5270649 14966993 4797078 5687129 8545399 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 1 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 2572806 2980772 2435576 2151656 1887449 2366833 2404309 1967901 Rescheduling interrupts
CAL: 638862 508650 531191 579853 596146 636037 652622 655700 Function call interrupts
TLB: 62859 43397 20200 6237 4423 11681 18652 4408 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 4706 4706 4706 4706 4706 4706 4706 4706 Machine check polls
HYP: 0 0 0 0 0 0 0 0 Hypervisor callback interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 0 0 0 0 Nested posted-interrupt event
这表明有很多中断从 NIC 发送,因此我尝试调整 NIC 以减少中断,
我尝试过但没有任何效果的选项:
- 我尝试禁用 irqbalance,
- 使用 smp_affinity 将 irq 10 分布在多个核心上(这根本不起作用,无论我如何更改 smp_affinity,irq 10 都只停留在一个核心上,尽管我读过一些文章说在这种情况下这不会提高性能)
- 将 MTU 的大小增加到 9000
- 将 rx 环形缓冲区大小增加至 2048
- 以及很多 sysctl 调整!
我还注意到我的 eth0 中存在许多 rx 错误:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 185.8.174.227 netmask 255.255.255.0 broadcast 185.8.174.255
inet6 fe80::84f9:91ff:fe5e:c862 prefixlen 64 scopeid 0x20<link>
ether 86:f9:91:5e:c8:62 txqueuelen 1000 (Ethernet)
RX packets 19862876 bytes 8071862301 (8.0 GB)
RX errors 1746656 dropped 0 overruns 0 frame 1746656
TX packets 22127410 bytes 13038619281 (13.0 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
这也是我的 haproxy.cfg 全局部分:
global
nbproc 2
#log /dev/log local0
#log /dev/log local1 notice
chroot /var/lib/haproxy
#stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
#stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 10000
# tune.ssl.default-dh-param 2048
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
#ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
#ssl-default-bind-options no-sslv3
那么我的服务器出了什么问题?任何帮助我都会接受。