诊断 haproxy 中的瓶颈或限制

2024-5-31 • tag-icon

我在高性能 Web 应用程序前面安装了 haproxy。在高达约 11k 请求/秒的流量下，服务器似乎在 haproxy 级别达到了限制或瓶颈，而底层服务器可以处理更多流量，并且响应延迟几乎为零。

为了说明这一点，我在一个简单的 hello world 测试端点上运行 curl。根据，直接在端口 8080 上访问应用程序的响应时间约为 17 毫秒time。我怀疑由于如此粗糙的基准设置，实际响应时间不到 1 毫秒。通过 haproxy 端口 80 需要 5 秒以上。我假设需要调整一些队列/积压限制：

root@01:/usr/share# time curl http://localhost/hello
hi there!
real    0m5.097s
user    0m0.008s
sys 0m0.012s
root@01:/usr/share# time curl http://localhost:8080/hello
hi there!
real    0m0.017s
user    0m0.012s
sys 0m0.000s

运行curl "http://localhost:9000/haproxy_stats;csv" > stats.csv，我得到：

# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,30000,30000,30000,244107765,170146478342,37776067415,0,0,49255,,,,,OPEN,,,,,,,,,1,2,0,,,,0,10606,0,25873,,,,0,409162321,0,51319,245288,4921,,12052,26301,409479925,,,0,0,0,0,,,,,,,,
servers,server1,0,0,16076,29017,50000,410260473,170146023052,37766853705,,0,,244621,4917,829803,0,no check,1,1,0,,,,,,1,3,1,,409430670,,2,12052,,34968,,,,0,409162321,0,1794,667,0,0,,,,4520,4917,,,,,0,,,0,690,646,1992,
servers,BACKEND,0,0,16076,29964,3000,409430670,170146023052,37766853705,0,0,,244621,4917,829803,0,UP,1,1,0,,0,45940,0,,1,3,0,,409430670,,1,12052,,26301,,,,0,409162321,0,2064,245288,4921,,,,,4520,4917,0,0,0,0,0,,,0,690,646,1992,
stats,FRONTEND,,,2,2,2000,772,86236,1141909,0,0,1,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,1,,,,0,770,0,1,0,0,,1,2,772,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,0,200,0,86236,1141909,0,0,,0,0,0,0,UP,0,0,0,,0,45940,0,,1,4,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,0,,,3663,0,2,317,

重新格式化为更易于阅读的表格文本（更新：修复格式）：

# pxname  svname    qcur  qmax  scur   smax   slim   stot       bin           bout         dreq  dresp  ereq   econ    eresp  wretr    wredis  status    weight  act  bck  chkfail  chkdown  lastchg  downtime  qlimit  pid  iid  sid  throttle  lbtot      tracked  type  rate   rate_lim  rate_max  check_status  check_code  check_duration  hrsp_1xx  hrsp_2xx   hrsp_3xx  hrsp_4xx  hrsp_5xx  hrsp_other  hanafail  req_rate  req_rate_max  req_tot    cli_abrt  srv_abrt  comp_in  comp_out  comp_byp  comp_rsp  lastsess  last_chk  last_agt  qtime  ctime  rtime  ttime  
http-in   FRONTEND              29930  30000  30000  260607375  176871559870  39278101192  0     0      50446                                  OPEN                                                                     1    2    0                                  0     11565  0         25873                                               0         425740885  0         52573     286642    5198                  10112     26301         426097360                      0        0         0         0                                                                   
servers   server1   0     0     12061  29020  50000  427063733  176871103092  39268664952        0             285733  5194   1017043  0       no check  1       1    0                                                 1    3    1              426046914           2     10110            34968                                               0         425740885  0         1858      909       0           0                                            5158      5194                                             0                             0      589    664    1931   
servers   BACKEND   0     0     12061  29964  3000   426046914  176871103092  39268664952  0     0             285733  5194   1017043  0       UP        1       1    0             0        47413    0                 1    3    0              426046914           1     10110            26301                                               0         425740885  0         2128      286642    5198                                                     5158      5194      0        0         0         0         0                             0      589    664    1931   
stats     FRONTEND              1      2      2000   798        89114         1181075      0     0      1                                      OPEN                                                                     1    4    0                                  0     1      0         1                                                   0         796        0         1         0         0                     1         2             798                            0        0         0         0                                                                   
stats     BACKEND   0     0     0      0      200    0          89114         1181075      0     0             0       0      0        0       UP        0       0    0             0        47413    0                 1    4    0              0                   1     0                0                                                   0         0          0         0         0         0                                                        0         0         0        0         0         0         0                             3481   0      2      334

一个看起来可疑的值是slim服务器/后端的 3000。我如何在配置中调整它？主服务器/服务器1 显示，其slim值50000恰好高于scur和smax值。

我的 haproxy.cfg 文件：

global
    daemon
    maxconn 300000
    # See running maxconn with:
    # echo "show info" | socat /var/run/haproxy.sock stdio
    stats socket /var/run/haproxy.sock mode 600 level admin
    stats timeout 2m

defaults
    # This doesn't help.
    # maxconn 25000
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http-in
    bind *:80
    maxconn 30000
    default_backend servers

backend servers
    # This doesn't help.
    # maxconn 25000
    server server1 127.0.0.1:8080 maxconn 50000
    stats enable

listen stats
    bind :9000
    mode http
    stats enable
    stats uri /haproxy_stats

相关内容