我们最近在服务器上达到了最大客户端数 (300),并且注意到很多连接都卡在 R 读取状态。以下是服务器状态转储的摘录:
Current Time: Monday, 29-Apr-2013 11:46:00 PDT
Restart Time: Monday, 29-Apr-2013 11:03:48 PDT
Parent Server Generation: 0
Server uptime: 42 minutes 12 seconds
Total accesses: 144170 - Total Traffic: 29.8 MB
CPU Usage: u188.25 s345.65 cu2601.11 cs0 - 124% CPU load
56.9 requests/sec - 12.0 kB/second - 216 B/request
244 requests currently being processed, 56 idle workers
KRR_RRRRRRRRRRRR_RRRKRRRR.......................................
K_RRK_RK_RKR_RRR_RRRR_R_R.......................................
RRRRRRRRRRRRRC_CR___RCRR_.......................................
R_RRRCRRRRRRRRCRRRCRRKRCR.......................................
R_K_RKR_CRRRRRRKC_RR_KR_R.......................................
RRRRKRR__RRRRKRRCR_RRRRRR.......................................
_RCRRRRRRRRRKR__RR_RRCRRR.......................................
RRRRRRRRCRRRRRRKRRRR_RCRR.......................................
R___RRCR__RR__RRRKC___RK_.......................................
_R_KCR__RR__R_R_RRRRRR_RK.......................................
RRR_RR___RCRRK_R_RWRR_RRC.......................................
RRKRCRRRRRRRRKRRRRRRRRCRR.......................................
................................................................
................................................................
................................................................
................................................................
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Srv PID Acc M CPU SS Req Conn Child Slot Client VHost Request
0-0 10320 1/718/718 K 399.14 0 42 0.0 0.21 0.21 69.171.178.24 mysite.com GET /home.php
0-0 10320 0/434/434 R 398.21 8 40 0.0 0.07 0.07 ? ? ..reading..
0-0 10320 0/548/548 R 331.52 558 45 0.0 0.12 0.12 ? ? ..reading..
0-0 10320 0/460/460 _ 399.05 0 42 0.0 0.08 0.08 98.129.101.123 mysite.com GET /home.php
0-0 10320 0/473/473 R 364.89 301 48 0.0 0.09 0.09 ? ? ..reading..
0-0 10320 0/616/616 R 396.18 26 46 0.0 0.11 0.11 ? ? ..reading..
0-0 10320 0/494/494 R 378.84 127 48 0.0 0.10 0.10 ? ? ..reading..
0-0 10320 0/231/231 R 311.62 687 43 0.0 0.04 0.04 ? ? ..reading..
0-0 10320 0/376/376 R 391.17 76 53 0.0 0.08 0.08 ? ? ..reading..
0-0 10320 0/377/377 R 380.72 176 53 0.0 0.07 0.07 ? ? ..reading..
0-0 10320 0/589/589 R 355.07 382 47 0.0 0.10 0.10 ? ? ..reading..
0-0 10320 0/587/587 R 393.16 56 50 0.0 0.13 0.13 ? ? ..reading..
0-0 10320 0/561/561 R 393.89 49 40 0.0 0.10 0.10 ? ? ..reading..
0-0 10320 0/616/616 R 389.88 81 41 0.0 0.22 0.22 ? ? ..reading..
0-0 10320 0/354/354 R 392.12 59 42 0.0 0.06 0.06 ? ? ..reading..
0-0 10320 0/346/346 R 377.51 192 44 0.0 0.06 0.06 ? ? ..reading..
为什么有这么多处于 R 状态?SS 列真的意味着他们已经读取了 558、301 秒等吗?
以下是我的一些服务器信息和设置:
Server version: Apache/2.2.24 (Unix)
Server built: Mar 22 2013 10:57:57
Cpanel::Easy::Apache v3.18.9 rev9999
Server's Module Magic Number: 20051115:31 Server loaded: APR 1.4.6, APR-Util 1.4.1 Compiled using: APR 1.4.6, APR-Util 1.4.1
Architecture: 64-bit
Server MPM: Worker
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/worker"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) -D APR_USE_SYSVSEM_SERIALIZE -D APR_USE_PTHREAD_SERIALIZE -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT -D APR_HAS_OTHER_CHILD -D AP_HAVE_RELIABLE_PIPED_LOGS -D DYNAMIC_MODULE_LIMIT=128 -D HTTPD_ROOT="/usr/local/apache"
-D SUEXEC_BIN="/usr/local/apache/bin/suexec"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_ERRORLOG="logs/error_log"
-D AP_TYPES_CONFIG_FILE="conf/mime.types"
-D SERVER_CONFIG_FILE="conf/httpd.conf"
StartServers 5
<IfModule prefork.c>
MinSpareServers 5
MaxSpareServers 10
</IfModule>
ServerLimit 300
MaxClients 300
MaxRequestsPerChild 10000
KeepAlive On
KeepAliveTimeout 1
MaxKeepAliveRequests 100
Here are compiled in modules:
core.c
mod_authn_file.c
mod_authn_dbm.c
mod_authn_dbd.c
mod_authn_default.c
mod_authn_alias.c
mod_authz_host.c
mod_authz_groupfile.c
mod_authz_user.c
mod_authz_dbm.c
mod_authz_default.c
mod_auth_basic.c
mod_auth_digest.c
mod_file_cache.c
mod_cache.c
mod_disk_cache.c
mod_include.c
mod_filter.c
mod_deflate.c
mod_log_config.c
mod_log_forensic.c
mod_logio.c
mod_env.c
mod_mime_magic.c
mod_expires.c
mod_headers.c
mod_unique_id.c
mod_setenvif.c
mod_version.c
mod_proxy.c
mod_proxy_connect.c
mod_proxy_ftp.c
mod_proxy_http.c
mod_proxy_scgi.c
mod_proxy_ajp.c
mod_proxy_balancer.c
mod_ssl.c
worker.c
http_core.c
mod_mime.c
mod_dav.c
mod_status.c
mod_autoindex.c
mod_asis.c
mod_info.c
mod_suexec.c
mod_cgid.c
mod_dav_fs.c
mod_dav_lock.c
mod_negotiation.c
mod_dir.c
mod_actions.c
mod_speling.c
mod_userdir.c
mod_alias.c
mod_rewrite.c
mod_so.c
当我们重新启动时,阅读请求每次都会慢慢开始重新增加。提前感谢您的帮助!
更新:这是 netstat -pn:
tcp 0 0 198.57.162.52:80 54.235.31.205:41209 TIME_WAIT -
tcp 0 0 198.57.162.52:80 166.147.121.170:63246 TIME_WAIT -
tcp 0 0 198.57.162.52:80 50.128.171.238:50751 FIN_WAIT2 -
tcp 0 0 198.57.162.52:80 89.241.135.84:24825 TIME_WAIT -
tcp 0 0 198.57.162.52:80 205.142.167.225:33256 TIME_WAIT -
tcp 0 0 198.57.162.52:80 76.127.87.244:56493 TIME_WAIT -
tcp 0 0 198.57.162.52:80 205.153.156.222:21571 TIME_WAIT -
tcp 0 0 198.57.162.52:80 108.12.147.14:61135 TIME_WAIT -
tcp 0 0 198.57.162.52:80 166.217.157.32:1630 TIME_WAIT -
tcp 0 0 198.57.162.52:443 71.22.102.71:4252 TIME_WAIT -
lsof 输出示例:
httpd 4092 nobody 25u IPv4 711278095 0t0 TCP 198-57-162-52.unifiedlayer.com:http->4.sub-70-193-66.myvzw.com:12471 (ESTABLISHED)
httpd 4092 nobody 26u IPv4 711350284 0t0 TCP 198-57-162-52.unifiedlayer.com:http->c75-111-15-253.amrlcmta01.tx.dh.suddenlink.net:51298 (ESTABLISHED)
httpd 4092 nobody 27u IPv4 711400163 0t0 TCP 198-57-162-52.unifiedlayer.com:http->108-235-145-129.lightspeed.irvnca.sbcglobal.net:56037 (FIN_WAIT2)
httpd 4092 nobody 28u IPv4 711390092 0t0 TCP 198-57-162-52.unifiedlayer.com:http->c-98-239-167-37.hsd1.pa.comcast.net:57988 (ESTABLISHED)
httpd 4092 nobody 29u IPv4 711206563 0t0 TCP 198-57-162-52.unifiedlayer.com:http->252.sub-70-199-129.myvzw.com:6179 (ESTABLISHED)
httpd 4092 nobody 30u IPv4 711271036 0t0 TCP 198-57-162-52.unifiedlayer.com:https->5.71.20.78:62114 (ESTABLISHED)
httpd 4092 nobody 31u IPv4 711385050 0t0 TCP 198-57-162-52.unifiedlayer.com:http->64.128.135.115:17547 (ESTABLISHED)
httpd 4092 nobody 32u IPv4 711400374 0t0 TCP 198-57-162-52.unifiedlayer.com:https->mbb5336d0.tmodns.net:36277 (ESTABLISHED)
httpd 4092 nobody 33u IPv4 711378965 0t0 TCP 198-57-162-52.unifiedlayer.com:http->us-chi-exchange.brightstarcorp.com:47707 (ESTABLISHED)
httpd 4092 nobody 34u IPv4 711394997 0t0 TCP 198-57-162-52.unifiedlayer.com:https->209-152-137-170.dsl.mcmo.net:64370 (ESTABLISHED)
httpd 4092 nobody 35u IPv4 711391122 0t0 TCP 198-57-162-52.unifiedlayer.com:http->c-76-125-136-230.hsd1.oh.comcast.net:57093 (ESTABLISHED)
httpd 4092 nobody 36u IPv4 711398994 0t0 TCP 198-57-162-52.unifiedlayer.com:http->lookup.eatright.org:60021 (ESTABLISHED)
httpd 4092 nobody 37u IPv4 711400972 0t0 TCP 198-57-162-52.unifiedlayer.com:http->modemcable126.224-130-66.mc.videotron.ca:55970 (ESTABLISHED)
httpd 4092 nobody 38u IPv4 711379727 0t0 TCP 198-57-162-52.unifiedlayer.com:http->c-75-65-146-83.hsd1.ms.comcast.net:50309 (ESTABLISHED)
httpd 4092 nobody 39u IPv4 711390427 0t0 TCP 198-57-162-52.unifiedlayer.com:https->66-87-130-177.pools.spcsdns.net:41127 (ESTABLISHED)
httpd 4092 nobody 40u IPv4 711401179 0t0 TCP 198-57-162-52.unifiedlayer.com:http->8.225.196.132:20608 (ESTABLISHED)
httpd 4092 nobody 41u IPv4 711142701 0t0 TCP 198-57-162-52.unifiedlayer.com:http->c-24-15-33-77.hsd1.il.comcast.net:54631 (ESTABLISHED)
答案1
@ShaneN 您的服务器正达到资源瓶颈。也许您没有临时端口范围,或者没有系统文件描述符,这会限制您的服务器可用于响应请求的套接字数量。读取状态请求意味着套接字已打开,但请求行从未到达服务器,这是一种已知的 DoS 形式,称为 SlowLoris。为了缓解这种 DoS,我首先要:
/etc/sysctl.conf
将以下条目添加到:
net.ipv4.tcp_fin_timeout = 10
将套接字的超时设置FIN
为 10 秒。
net.ipv4.ip_local_port_range = "15000 61000"
将临时端口范围设置为每个 IP 大约 46000 个可能的开放端口。
net.core.somaxconn = 1024
- 默认值 128 - 该net.core.somaxconn
值起着重要作用。它限制了排队到侦听套接字的最大请求数。
net.core.netdev_max_backlog = 2000
- 默认值为 1000 - 增加了传入请求的 TCP 队列。
将这些行添加到文件后,发出sysctl -p
在当前运行时加载内核参数并检查您的高峰时间以查看问题是否仍然存在。
答案2
这可能是一种拒绝服务攻击,目的是占用监听套接字并占用您的服务器。如果您能确定是这种情况,我建议尝试使用前端代理,例如nginx减轻 Apache 的压力。此类代理可以处理大量连接,至少可以让您有喘息的空间来弄清楚发生了什么。