我在高性能 Web 应用程序前面安装了 haproxy。在高达约 11k 请求/秒的流量下,服务器似乎在 haproxy 级别达到了限制或瓶颈,而底层服务器可以处理更多流量,并且响应延迟几乎为零。
为了说明这一点,我在一个简单的 hello world 测试端点上运行 curl。根据 ,直接在端口 8080 上访问应用程序的响应时间约为 17 毫秒time
。我怀疑由于如此粗糙的基准设置,实际响应时间不到 1 毫秒。通过 haproxy 端口 80 需要 5 秒以上。我假设需要调整一些队列/积压限制:
root@01:/usr/share# time curl http://localhost/hello
hi there!
real 0m5.097s
user 0m0.008s
sys 0m0.012s
root@01:/usr/share# time curl http://localhost:8080/hello
hi there!
real 0m0.017s
user 0m0.012s
sys 0m0.000s
运行curl "http://localhost:9000/haproxy_stats;csv" > stats.csv
,我得到:
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,30000,30000,30000,244107765,170146478342,37776067415,0,0,49255,,,,,OPEN,,,,,,,,,1,2,0,,,,0,10606,0,25873,,,,0,409162321,0,51319,245288,4921,,12052,26301,409479925,,,0,0,0,0,,,,,,,,
servers,server1,0,0,16076,29017,50000,410260473,170146023052,37766853705,,0,,244621,4917,829803,0,no check,1,1,0,,,,,,1,3,1,,409430670,,2,12052,,34968,,,,0,409162321,0,1794,667,0,0,,,,4520,4917,,,,,0,,,0,690,646,1992,
servers,BACKEND,0,0,16076,29964,3000,409430670,170146023052,37766853705,0,0,,244621,4917,829803,0,UP,1,1,0,,0,45940,0,,1,3,0,,409430670,,1,12052,,26301,,,,0,409162321,0,2064,245288,4921,,,,,4520,4917,0,0,0,0,0,,,0,690,646,1992,
stats,FRONTEND,,,2,2,2000,772,86236,1141909,0,0,1,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,1,,,,0,770,0,1,0,0,,1,2,772,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,0,200,0,86236,1141909,0,0,,0,0,0,0,UP,0,0,0,,0,45940,0,,1,4,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0,,,3663,0,2,317,
重新格式化为更易于阅读的表格文本(更新:修复格式):
# pxname svname qcur qmax scur smax slim stot bin bout dreq dresp ereq econ eresp wretr wredis status weight act bck chkfail chkdown lastchg downtime qlimit pid iid sid throttle lbtot tracked type rate rate_lim rate_max check_status check_code check_duration hrsp_1xx hrsp_2xx hrsp_3xx hrsp_4xx hrsp_5xx hrsp_other hanafail req_rate req_rate_max req_tot cli_abrt srv_abrt comp_in comp_out comp_byp comp_rsp lastsess last_chk last_agt qtime ctime rtime ttime
http-in FRONTEND 29930 30000 30000 260607375 176871559870 39278101192 0 0 50446 OPEN 1 2 0 0 11565 0 25873 0 425740885 0 52573 286642 5198 10112 26301 426097360 0 0 0 0
servers server1 0 0 12061 29020 50000 427063733 176871103092 39268664952 0 285733 5194 1017043 0 no check 1 1 0 1 3 1 426046914 2 10110 34968 0 425740885 0 1858 909 0 0 5158 5194 0 0 589 664 1931
servers BACKEND 0 0 12061 29964 3000 426046914 176871103092 39268664952 0 0 285733 5194 1017043 0 UP 1 1 0 0 47413 0 1 3 0 426046914 1 10110 26301 0 425740885 0 2128 286642 5198 5158 5194 0 0 0 0 0 0 589 664 1931
stats FRONTEND 1 2 2000 798 89114 1181075 0 0 1 OPEN 1 4 0 0 1 0 1 0 796 0 1 0 0 1 2 798 0 0 0 0
stats BACKEND 0 0 0 0 200 0 89114 1181075 0 0 0 0 0 0 UP 0 0 0 0 47413 0 1 4 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3481 0 2 334
一个看起来可疑的值是slim
服务器/后端的 3000。我如何在配置中调整它?主服务器/服务器1 显示 ,其slim
值50000
恰好高于scur
和smax
值。
我的 haproxy.cfg 文件:
global
daemon
maxconn 300000
# See running maxconn with:
# echo "show info" | socat /var/run/haproxy.sock stdio
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
defaults
# This doesn't help.
# maxconn 25000
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
maxconn 30000
default_backend servers
backend servers
# This doesn't help.
# maxconn 25000
server server1 127.0.0.1:8080 maxconn 50000
stats enable
listen stats
bind :9000
mode http
stats enable
stats uri /haproxy_stats