php-fpm 保持子进程运行,未完成请求,并且快速获得最大子进程

php-fpm 保持子进程运行,未完成请求,并且快速获得最大子进程

我有一台节点服务器,上面有 8 个网站,其中 2 个网站流量最大,所有网站都连接到安全级别较低的 cloudflare CDN

并为每个网站设置 php-fpm 池

其中一个只有一个奇怪的问题

有时它工作正常,没有任何问题,突然 php-fpm 中的所有子进程(挂起)并一直运行而没有完成请求,这使得快速达到 max.children 并且网站上的所有请求都未完成

;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;


pid = /usr/var/run/php-fpm.pid
error_log = /log/php-fpm.log
log_level = notice
emergency_restart_threshold = 20
emergency_restart_interval = 2m
process_control_timeout = 10s
daemonize = yes
rlimit_files = 300
rlimit_core = unlimited

[www]
listen = 127.0.0.1:9000
listen.owner = root
listen.group = root
listen.allowed_clients = 127.0.0.1
user = nobody
group = nobody 
pm = ondemand
pm.max_children = 120 
pm.start_servers = 25
pm.min_spare_servers = 5
pm.max_spare_servers = 45
pm.max_requests = 500
pm.status_path = /fpm_status.php
ping.path = /ping
request_terminate_timeout = 15s
catch_workers_output = yes
env[HOSTNAME] = $HOSTNAME
env[PATH] = /usr/local/bin:/usr/bin:/bin
env[TMP] = /tmp
env[TMPDIR] = /tmp
env[TEMP] = /tmp
env[TEMP] = /tmp
env[OSTYPE] = $OSTYPE
env[MALLOC_CHECK_] = 2
env[MACHTYPE] = $MACHTYPE
php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i
php_admin_value[memory_limit] = 512M
php_admin_value[mysql.connect_timeout] = 30
php_admin_value[default_socket_timeout] = 30  

[site1.com]
user = site1
group = site1
listen = /var/run/php5-fpm-site1.sock
listen.owner = site1
listen.group = site1
pm = ondemand
pm.max_children = 120
pm.start_servers = 25
pm.min_spare_servers = 5
pm.max_spare_servers = 45
pm.max_requests = 500
chdir = /home/site1/public_html

[sub1.site1.com]
user = sub1site1
group = sub1site1
listen = /var/run/php5-fpm-sub1site1.sock
listen.owner = sub1site1
listen.group = sub1site1
pm = ondemand 
pm.max_children = 25
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 1000
pm.status_path = /status
chdir = /home/sub1site1/public_html

site1.com 和 sub1.site1.com 的流量几乎相同,但 site1.com 运行非常顺畅,而 sub1.site1.com 存在这个问题

来自 sub1.site1.com/status

pool    sub1.site1.com
process manager ondemand
start time  23/Jun/2017:07:50:37 
start since 992
accepted conn   987
listen queue    0
max listen queue    0
listen queue len    0
idle processes  0
active processes    25
total processes 25
max active processes    25
max children reached    1
slow requests   0

并且所有子进程都处于运行状态。并且该请求未完成,有时会出现 502 错误

我尝试将最大子节点数增加到 200,但还是出现同样的问题

如果我重新启动节点服务器,sub1.site1 工作 3-5 小时后,会运行得很好,然后出现这个奇怪的错误

一切正常时服务器上的连接数

# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
3 SYN_RECV
32 LISTEN
205 TIME_WAIT
1142 ESTABLISHED

出现问题时 /var/log/message

Jun 23 06:14:26 server smartd[5739]: Device: /dev/sda [SAT], 3 Currently unreadable (pending) sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sda [SAT], 3 Offline uncorrectable sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sdb [SAT], 9 Currently unreadable (pending) sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sdb [SAT], 9 Offline uncorrectable sectors 
Jun 23 07:28:23 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681c:1402 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=639020 PROTO=TCP SPT=443 DPT=5908 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:25 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=902306 PROTO=TCP SPT=443 DPT=1331 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:27 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=453074 PROTO=TCP SPT=443 DPT=1301 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:29 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=945434 PROTO=TCP SPT=443 DPT=1262 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:31 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=789301 PROTO=TCP SPT=443 DPT=1265 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:33 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=87019 PROTO=TCP SPT=443 DPT=1314 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:35 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2a03:2880:f027:0013:face:b00c:0000:0002 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=80 TC=140 HOPLIMIT=54 FLOWLBL=0 PROTO=TCP SPT=443 DPT=2348 WINDOW=27960 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:37 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=846096 PROTO=TCP SPT=443 DPT=1274 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:39 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=164043 PROTO=TCP SPT=443 DPT=1338 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:41 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=486686 PROTO=TCP SPT=443 DPT=1277 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:43 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=63456 PROTO=TCP SPT=443 DPT=1344 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:45 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=502407 PROTO=TCP SPT=443 DPT=1341 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:47 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681c:1402 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=286658 PROTO=TCP SPT=443 DPT=5908 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:49 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=896418 PROTO=TCP SPT=443 DPT=1350 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:51 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=441585 PROTO=TCP SPT=443 DPT=1341 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:53 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=60 FLOWLBL=312546 PROTO=TCP SPT=443 DPT=1356 WINDOW=24400 RES=0x00 ACK SYN URGP=0
Jun 23 07:28:55 server kernel: Firewall: *TCP6IN Blocked* IN=eth0 OUT= MAC=54:04:a6:b8:6c:0e:84:c1:c1:76:a8:d5:86:dd SRC=2400:cb00:2048:0001:0000:0000:681f:4591 DST=2a01:04f8:0160:1311:0000:0000:0000:0002 LEN=72 TC=0 HOPLIMIT=59 FLOWLBL=462099 PROTO=TCP SPT=443 DPT=1310 WINDOW=24400 RES=0x00 ACK SYN URGP=0

问题发生时服务器上的连接数

# netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
1 established)
1 Foreign
3 SYN_RECV
32 LISTEN
365 TIME_WAIT
1905 ESTABLISHED

平均负载始终在 0.5 : 3.0 之间

内存使用量2G:已使用5G / 总内存16G

另外,当检查 mysql SHOW PROCESSLIST 时,从 sub1.site1 用户数据库的 sleep 命令中给我 25 个任务

为什么 php 请求有时会在没有任何延迟的情况下完成,而在重新启动节点 5 小时后,它的延迟太多,并且在没有完成请求的情况下创建了更多子节点

答案1

您的问题之一是您的服务器上的硬盘坏了:

Jun 23 06:14:26 server smartd[5739]: Device: /dev/sda [SAT], 3 Currently unreadable (pending) sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sda [SAT], 3 Offline uncorrectable sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sdb [SAT], 9 Currently unreadable (pending) sectors 
Jun 23 06:14:26 server smartd[5739]: Device: /dev/sdb [SAT], 9 Offline uncorrectable sectors 

这很可能就是您遇到的错误的原因。更换硬盘就可以了。

相关内容