存在 CLOSE_WAIT 连接时 HAProxy CPU 使用率高

存在 CLOSE_WAIT 连接时 HAProxy CPU 使用率高

我在用着https://github.com/mesosphere/marathon-lb配置 HAProxy。由于它会随着我们的微服务实例的任何变化而更改配置,因此它会创建大量 haproxy 进程。例如:

# pgrep haproxy -a
690 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 70155
3108 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 72735
7540 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 77012
8297 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 77058
9651 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 78452
10690 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 79475
15639 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 84966
15760 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 85082
16574 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 85637
16923 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 86235
17022 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 86278
17672 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 86375
18060 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 87011
18620 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 87398
19470 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 87809
20350 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 88653
52146 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 20253
52339 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 20744
53367 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 20934
53468 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 21957
53710 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22058
54324 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22295
54967 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 23482
55476 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 23537
55796 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 24162
55987 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 24199
56180 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 24546
56246 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 24582
56519 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 24879
57201 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 25317
57546 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 25568
57774 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 26080
60398 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 30723
60783 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 31118
89062 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 56479
89509 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 57252
89784 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 57531
90949 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 58144
91675 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 59389
93436 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 59422
93677 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 61172

所有连接完成后,haproxy 进程才会结束。但有时一个进程只有处于 CLOSE_WAIT 状态的连接,就会消耗 100% 的 CPU。

# lsof -i | awk '{if($1 == "haproxy") print $2 " " $10}' | sort -u
10690 (ESTABLISHED)
15760 (ESTABLISHED)
16574 (ESTABLISHED)
16923 (ESTABLISHED)
17022 (ESTABLISHED)
17672 (ESTABLISHED)
18060 (ESTABLISHED)
18620 (ESTABLISHED)
19470 (ESTABLISHED)
20350 (CLOSE_WAIT)
20350 (ESTABLISHED)
20350 (LISTEN)
3108 (ESTABLISHED)
52146 (ESTABLISHED)
52339 (ESTABLISHED)
53367 (ESTABLISHED)
53468 (ESTABLISHED)
53710 (ESTABLISHED)
54324 (ESTABLISHED)
54967 (ESTABLISHED)
55476 (ESTABLISHED)
55796 (ESTABLISHED)
55987 (ESTABLISHED)
56180 (ESTABLISHED)
56246 (ESTABLISHED)
56519 (ESTABLISHED)
57201 (ESTABLISHED)
57546 (ESTABLISHED)
57774 (ESTABLISHED)
60398 (CLOSE_WAIT)
60783 (CLOSE_WAIT)
690 (ESTABLISHED)
7540 (ESTABLISHED)
8297 (ESTABLISHED)
89062 (ESTABLISHED)
89509 (ESTABLISHED)
89784 (ESTABLISHED)
90949 (ESTABLISHED)
91675 (ESTABLISHED)
93436 (ESTABLISHED)
93677 (ESTABLISHED)

这里进程 60398 和 60783 只有 CLOSE_WAIT 状态。

# lsof -i | awk '{if($2 == "60398") print $2 " " $9 " " $10}'
60398 mesos-lb-3.mydomain:35819->leia-5.mydomain:31302 (CLOSE_WAIT)
# lsof -i | awk '{if($2 == "60783") print $2 " " $9 " " $10}'
60783 mesos-lb-3.mydomain:37419->leia-8.mydomain:31682 (CLOSE_WAIT)

展位的 strace 显示相同的结果:

poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)
poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)
poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)
poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)
poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)
poll(0x7fe774e2e010, 0, 0)              = 0 (Timeout)

什么tcpdump -vv port 35819都没显示。端口也一样37419

关于环境:

# haproxy -vv
HA-Proxy version 1.6.9 2016/08/30
Copyright 2000-2016 Willy Tarreau <[email protected]>

Build options :
  TARGET  = custom
  CPU     = x86_64
  CC      = gcc
  CFLAGS  = -g -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_LIBCRYPT=1 USE_ZLIB=1 USE_POLL=default USE_DL=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
Running on OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 2 (2 usable), will use poll.

它使用 docker 运行,并且镜像基于debian:stretchimage。docker 主机信息为:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:   trusty

# uname -a
Linux mesos-lb-4.mydomain 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

知道发生了什么吗?谢谢关注

答案1

只需将 haproxy 改为使用 epoll 就足以解决问题。

已解决版本问题https://github.com/mesosphere/marathon-lb/releases/tag/v1.4.2

相关内容