我们最近遇到了流量高峰,虽然流量不大,但却导致 haproxy 的一个 CPU 核心达到最大负荷(服务器变得无响应)。我猜是我对配置做了一些低效的操作,因此想请教所有 haproxy 专家,他们是否愿意批评我下面的配置文件(主要是从性能角度)。
该配置旨在在一组 http 应用程序服务器、一组处理 websockets 连接的服务器(在不同端口上具有多个单独的进程)和一个静态文件 Web 服务器之间进行分配。除了性能问题外,它运行良好。(一些细节已被删除。)
如您能提供任何指导我将非常感激!
HAProxy v1.4.8
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
daemon
maxconn 100000
log 127.0.0.1 local0 notice
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
log global
mode http
option httplog
option httpclose #http://serverfault.com/a/104782/52811
timeout connect 5000ms
timeout client 50000ms
timeout server 5h #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel';
#---------------------------------------------------------------------
# FRONTEND
#---------------------------------------------------------------------
frontend public
bind *:80
maxconn 100000
reqidel ^X-Forwarded-For:.* #Remove any x-forwarded-for headers
option forwardfor #Set the forwarded for header (needs option httpclose)
default_backend app
redirect prefix http://xxxxxxxxxxxxxxxxx code 301 if { hdr(host) -i www.xxxxxxxxxxxxxxxxxxx }
timeout client 5h #long timeouts to stop WS drops - when v1.5 is stable, use 'timeout tunnel';
# ACLs
##########
acl static_request hdr_beg(host) -i i.
acl static_request hdr_beg(host) -i static.
acl static_request path_beg /favicon.ico /robots.txt
acl test_request hdr_beg(host) -i test.
acl ws_request hdr_beg(host) -i ws
# ws11
acl ws11x1_request hdr_beg(host) -i ws11x1
acl ws11x2_request hdr_beg(host) -i ws11x2
acl ws11x3_request hdr_beg(host) -i ws11x3
acl ws11x4_request hdr_beg(host) -i ws11x4
acl ws11x5_request hdr_beg(host) -i ws11x5
acl ws11x6_request hdr_beg(host) -i ws11x6
# ws12
acl ws12x1_request hdr_beg(host) -i ws12x1
acl ws12x2_request hdr_beg(host) -i ws12x2
acl ws12x3_request hdr_beg(host) -i ws12x3
acl ws12x4_request hdr_beg(host) -i ws12x4
acl ws12x5_request hdr_beg(host) -i ws12x5
acl ws12x6_request hdr_beg(host) -i ws12x6
# Which backend....
###################
use_backend static if static_request
#ws11
use_backend ws11x1 if ws11x1_request
use_backend ws11x2 if ws11x2_request
use_backend ws11x3 if ws11x3_request
use_backend ws11x4 if ws11x4_request
use_backend ws11x5 if ws11x5_request
use_backend ws11x6 if ws11x6_request
#ws12
use_backend ws12x1 if ws12x1_request
use_backend ws12x2 if ws12x2_request
use_backend ws12x3 if ws12x3_request
use_backend ws12x4 if ws12x4_request
use_backend ws12x5 if ws12x5_request
use_backend ws12x6 if ws12x6_request
#---------------------------------------------------------------------
# BACKEND - APP
#---------------------------------------------------------------------
backend app
timeout server 50000ms #To counter the WS default
mode http
balance roundrobin
option httpchk HEAD /upchk.txt
server app1 app1:8000 maxconn 100000 check
server app2 app2:8000 maxconn 100000 check
server app3 app3:8000 maxconn 100000 check
server app4 app4:8000 maxconn 100000 check
#---------------------------------------------------------------------
# BACKENDs - WS
#---------------------------------------------------------------------
#Server ws11
backend ws11x1
server ws11 ws11:8001 maxconn 100000
backend ws11x2
server ws11 ws11:8002 maxconn 100000
backend ws11x3
server ws11 ws11:8003 maxconn 100000
backend ws11x4
server ws11 ws11:8004 maxconn 100000
backend ws11x5
server ws11 ws11:8005 maxconn 100000
backend ws11x6
server ws11 ws11:8006 maxconn 100000
#Server ws12
backend ws12x1
server ws12 ws12:8001 maxconn 100000
backend ws12x2
server ws12 ws12:8002 maxconn 100000
backend ws12x3
server ws12 ws12:8003 maxconn 100000
backend ws12x4
server ws12 ws12:8004 maxconn 100000
backend ws12x5
server ws12 ws12:8005 maxconn 100000
backend ws12x6
server ws12 ws12:8006 maxconn 100000
#---------------------------------------------------------------------
# BACKEND - STATIC
#---------------------------------------------------------------------
backend static
server static1 static1:80 maxconn 40000
答案1
100,000 个连接很多... 您要推送这么多吗?如果是这样... 也许可以拆分前端,使其绑定在一个用于静态内容的 IP 上,另一个用于应用内容的 IP 上,然后将静态和应用变体作为单独的 haproxy 进程运行(假设服务器上有第二个核心/ CPU)...
如果没有其他选择,它将把使用范围缩小到应用程序或静态流......
如果我没有记错的话,我的网络 101 课程... HaProxy 不应该能够连接100,000
到ws12:8001
或任何其他后端主机:端口,因为28232
大多数系统上的端口限制约为 ~65536(cat /proc/sys/net/ipv4/ip_local_port_range
)。您可能正在耗尽本地端口,这反过来可能导致 CPU 在等待端口释放时挂起。
也许将每个后端的最大连接数降低到接近 28000 可以缓解这个问题?或者将本地端口范围更改为更具包容性?
答案2
查看 nbproc 设置,看看利用多个核心是否有帮助。对于大多数硬件负载平衡器,您可以处理的流量量受负载平衡器的 CPU/内存限制。
1.5) Increasing the overall processing power
--------------------------------------------
On multi-processor systems, it may seem to be a shame to use only one processor,
eventhough the load needed to saturate a recent processor is far above common
usage. Anyway, for very specific needs, the proxy can start several processes
between which the operating system will spread the incoming connections. The
number of processes is controlled by the 'nbproc' parameter in the 'global'
section. It defaults to 1, and obviously works only in 'daemon' mode. One
typical usage of this parameter has been to workaround the default per-process
file-descriptor limit that Solaris imposes to user processes.
Example :
---------
global
daemon
quiet
nbproc 2
答案3
在 haproxy 的配置之外,它还有助于进行一些网络调整。
有一件事可能会有所帮助,那就是确保您的网络接口没有固定到单个 CPU(假设您使用多个接口)。如果您在 Linux 上运行 haproxy,您可以像这样检查平衡:
egrep CPU\|eth /proc/interrupts
例如,这表明 eth0 和 eth1 的中断由不同的 CPU 处理:
$ egrep CPU\|eth /proc/interrupts
CPU0 CPU1 CPU2 CPU3
103: 3515635238 0 0 0 IR-PCI-MSI-edge eth0
104: 0 1976927064 0 0 IR-PCI-MSI-edge eth1
而这表明它们由同一个 CPU 处理:
$ egrep CPU\|eth /proc/interrupts
CPU0 CPU1 CPU2 CPU3
272: 1526254507 0 0 0 Dynamic-irq eth0
273: 4877925 0 0 0 Dynamic-irq eth1
您将需要为这些接口启用 smp 亲和性。对于上述示例,您可以执行以下操作:
echo 010 > /proc/irq/272/smp_affinity
echo 010 > /proc/irq/273/smp_affinity
答案4
我建议通过将 nbthread 选项放入全局部分来激活“多线程模式”。来自 man:
此设置仅在内置线程支持时可用。它为每个创建的进程创建线程。这意味着如果 HAProxy 在前台启动,它只会为第一个进程创建线程
我们激活了“多线程模式”,我们的网站开始以 15 倍的速度运行。您可以在此处阅读有关多进程和多线程选项的更多信息:https://www.haproxy.com/blog/multithreading-in-haproxy/