nginx监控负载下无法加载状态页面

nginx监控负载下无法加载状态页面

Nginx 监控脚本所谓中天无法加载 nginx 测试页面(主要是在 nginx 最高负载下大约 2000 rps,用作代理),导致 zabbix 上出现“nginx 已关闭”等错误,一秒钟后,一切似乎都正常。

 [NginxStatus] 2015-12-16 20:24:55,289 - ERROR: failed to load test page
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/ztc/nginx/__init__.py", line 56, in _read_status
        u = urllib2.urlopen(url, None, 1)
      File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen
        return _opener.open(url, data, timeout)
      File "/usr/lib64/python2.6/urllib2.py", line 391, in open
        response = self._open(req, data)
      File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
        '_open', req)
      File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
        result = func(*args)
      File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
        return self.do_open(httplib.HTTPConnection, req)
      File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
        raise URLError(err)
    URLError: <urlopen error timed out>

由于它仅在最高负载(约 2000 rps)下发生,我将其与导致这种情况的一些内核参数联系起来。

这是 nginx 配置:

user nginx;
worker_processes  4;
timer_resolution 100ms;
worker_priority -15;
worker_rlimit_nofile 200000;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
  worker_connections  65536;
  use epoll;
  multi_accept on;
}
http {

  include       /etc/nginx/mime.types;
  default_type  application/octet-stream;

  server_tokens off;

  access_log    /var/log/nginx/access.log;

  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;

#  keepalive_requests 120;
#  keepalive_timeout  65;


  gzip  on;
  gzip_http_version 1.0;
  gzip_comp_level 2;
  gzip_proxied any;
  gzip_vary off;
  gzip_types text/plain text/css application/x-javascript text/xml application/xml application/rss+xml application/atom+xml text/javascript application/javas$
ript application/json text/mathml;
  gzip_min_length  1000;
  gzip_disable     "MSIE [1-6]\.";


  variables_hash_max_size 1024;
  variables_hash_bucket_size 64;
  server_names_hash_bucket_size 64;
  types_hash_max_size 2048;
  types_hash_bucket_size 64;



  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

这是 sysctl.conf

net.ipv4.conf.all.secure_redirects=0
net.ipv4.conf.all.send_redirects=0
net.ipv4.tcp_max_syn_backlog=20480
net.ipv4.tcp_synack_retries=2
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.netfilter.nf_conntrack_max=1048576
net.nf_conntrack_max=1048576
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_tw_reuse=1
net.core.somaxconn=15000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=5
net.ipv4.tcp_max_tw_buckets=720000
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_fin_timeout=30

netstat 输出:

netstat -an | grep -e :80 -e :443 |awk '/^tcp/ {A[$(NF)]++} END {for (I in A) {printf "%5d %s\n", A[I], I}}'

18525 TIME_WAIT
    1 CLOSE_WAIT
  499 FIN_WAIT1
 1544 FIN_WAIT2
33311 ESTABLISHED
  563 SYN_RECV
    7 CLOSING
  294 LAST_ACK
    3 LISTEN

造成这种情况的根本原因是什么? 2000rps 的 netstat 指标是否异常? 我的 sysctl.conf 中是否存在错误,从而导致我的问题?

相关内容