由于某种原因,我们发现 nginx 无法响应标准文件流量。到目前为止,我们发现的唯一解决方案是添加更多节点,目前最多 23 个虚拟机。但是,没有一个虚拟机处于繁忙状态,nginx 仍然无法完全应对。
我们的 http 和 https 服务没有 javascript,只是一些可以从世界各地访问的 PB 文件,用于组学内容的开放数据。
我们没有钱购买许可证,一切都是开源的,数据也是如此,我来这里是希望有人可以告诉我们,我们犯了一个巨大的错误,以便我们能够相对快速地修复它并停止向这个 http 服务投入更多服务器。
我将展示我们的配置,当我们在 23 台服务器上分配负载时,然后我将负载集中在一台服务器上,然后我将其设置回 23。
我们的配置
[root@hlvlpxfer-http-ebi-002 nginx]# uname -a
Linux hlvlpxfer-http-ebi-002.ebi.ac.uk 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@hlvlpxfer-http-ebi-002 nginx]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: OracleServer
Description: Oracle Linux Server release 8.5
Release: 8.5
Codename: n/a
[root@hlvlpxfer-http-ebi-002 nginx]# nginx -V
nginx version: nginx/1.14.1
built by gcc 8.2.1 20180905 (Red Hat 8.2.1-3.0.1) (GCC)
built with OpenSSL 1.1.1 FIPS 11 Sep 2018 (running with OpenSSL 1.1.1k FIPS 25 Mar 2021)
TLS SNI support enabled
configure arguments: --prefix=/usr/share/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --http-client-body-temp-path=/var/lib/nginx/tmp/client_body --http-proxy-temp-path=/var/lib/nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/nginx/tmp/scgi --pid-path=/run/nginx.pid --lock-path=/run/lock/subsys/nginx --user=nginx --group=nginx --with-file-aio --with-ipv6 --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module=dynamic --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_slice_module --with-http_stub_status_module --with-http_perl_module=dynamic --with-http_auth_request_module --with-mail=dynamic --with-mail_ssl_module --with-pcre --with-pcre-jit --with-stream=dynamic --with-stream_ssl_module --with-debug --with-cc-opt='-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-E'
[root@hlvlpxfer-http-ebi-002 nginx]#
nginx.conf 和 conf.d/common*
cat /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
worker_rlimit_nofile 999999;
#error_log /var/log/nginx/error-debug.log debug;
pid /run/nginx.pid;
# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 32768;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format compression '$http_x_real_ip - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$gzip_ratio"';
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format ebi-logs '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" ';
access_log /var/log/nginx/access.log main;
chunked_transfer_encoding off;
client_body_buffer_size 32k;
client_body_timeout 11;
# Gzip
gunzip on;
gzip on;
gzip_buffers 16 8k;
gzip_comp_level 6;
gzip_disable "msie6";
gzip_http_version 1.0;
gzip_min_length 10240;
gzip_proxied any;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/vnd.ms-fontobject application/x-font-ttf font/opentype image/svg+xml image/x-icon;
gzip_vary on;
keepalive_requests 100000;
keepalive_timeout 30;
open_file_cache max=200000 inactive=20s;
open_file_cache_errors on;
open_file_cache_min_uses 2;
open_file_cache_valid 30s;
reset_timedout_connection on;
send_timeout 11;
sendfile off;
sendfile_max_chunk 512k;
server_names_hash_bucket_size 256;
server_tokens off;
tcp_nodelay on;
server {
# SRA
listen 80;
listen 443 ssl;
server_name ftp.sra.ebi.ac.uk;
access_log /var/log/nginx/xfer-ftp.sra.ebi.ac.uk.log ebi-logs;
include /etc/nginx/conf.d/common-server.conf;
}
server {
# Default
listen 80 default_server;
listen 443 default_server ssl;
server_name localhost;
access_log /var/log/nginx/xfer-ftp.ebi.ac.uk.log ebi-logs;
include /etc/nginx/conf.d/common-server.conf;
}
}
[root@hlvlpxfer-http-ebi-002 nginx]# cat /etc/nginx/conf.d/common-server.conf
root /xfer/public/;
ssl_certificate /etc/pki/tls/certs/ftp.ebi.ac.uk.crt;
ssl_certificate_key /etc/pki/tls/private/ftp.ebi.ac.uk.key;
ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers ALL:!aNULL:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
location ~* "\.(json|txt)z$" {
add_header Content-Encoding gzip;
gzip off;
types {
application/json jsonz;
}
}
location / {
root /xfer/public/;
autoindex on;
max_ranges 30;
sendfile_max_chunk 512k;
sendfile on;
add_header 'Access-Control-Allow-Origin' '*';
if ($request_method = 'OPTIONS') {
add_header 'Access-Control-Allow-Origin' '*';
add_header 'Access-Control-Allow-Methods' 'GET,OPTIONS';
#
# Custom headers and headers various browsers *should* be OK with but aren't
#
add_header 'Access-Control-Allow-Headers' 'Authorization,Origin,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Accept';
#
# Tell client that this pre-flight info is valid for 20 days
#
add_header 'Access-Control-Max-Age' 1728000;
return 200;
}
}
负载(或负载不足)
[root@hlvlpxfer-http-ebi-002 nginx]# uptime
20:54:26 up 3:56, 1 user, load average: 0.04, 0.10, 0.09
[root@hlvlpxfer-http-ebi-002 nginx]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
BIOS Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
BIOS Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
Stepping: 0
CPU MHz: 2297.339
BogoMIPS: 4594.67
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep arat md_clear flush_l1d arch_capabilities
[root@hlvlpxfer-http-ebi-002 nginx]# ps axu|grep nginx
root 9122 0.0 0.0 104148 2344 ? Ss 17:37 0:00 nginx: master process /usr/sbin/nginx
nginx 9123 0.1 0.0 151264 26948 ? S 17:37 0:12 nginx: worker process
nginx 9124 0.2 0.0 150544 26456 ? S 17:37 0:24 nginx: worker process
nginx 9125 0.1 0.0 151388 27272 ? S 17:37 0:21 nginx: worker process
nginx 9126 0.1 0.0 150656 26540 ? S 17:37 0:12 nginx: worker process
root 39099 0.0 0.0 221924 1136 pts/0 R+ 20:54 0:00 grep --color=auto nginx
nginx_status
[root@hlvlpxfer-http-ebi-002 nginx]# curl 127.0.0.1/nginx_status
Active connections: 14
server accepts handled requests
11524 11524 33976
Reading: 0 Writing: 9 Waiting: 5
正常运行时间和免费
[root@hlvlpxfer-http-ebi-002 nginx]# uptime
20:54:57 up 3:57, 1 user, load average: 0.02, 0.09, 0.08
[root@hlvlpxfer-http-ebi-002 nginx]# free
total used free shared buff/cache available
Mem: 32560712 4267664 236280 9160 28056768 27854040
Swap: 2097148 12044 2085104
[root@hlvlpxfer-http-ebi-002 nginx]# free -m
total used free shared buff/cache available
Mem: 31797 4175 250 8 27371 27192
Swap: 2047 11 2036
[root@hlvlpxfer-http-ebi-002 nginx]#
施加负载前
[root@hlvlpxfer-http-ebi-002 nginx]# # No load
[root@hlvlpxfer-http-ebi-002 nginx]# echo "START"; date ; curl 127.0.0.1/nginx_status ; uptime ; free -m ; ps axu|grep nginx ; echo "END " ; date
START
Fri Jul 29 21:05:36 BST 2022
Active connections: 52
server accepts handled requests
12161 12161 37508
Reading: 0 Writing: 16 Waiting: 36
21:05:38 up 4:08, 1 user, load average: 0.16, 0.20, 0.13
total used free shared buff/cache available
Mem: 31797 4136 228 8 27432 27231
Swap: 2047 11 2036
root 9122 0.0 0.0 104148 2344 ? Ss 17:37 0:00 nginx: master process /usr/sbin/nginx
nginx 9123 0.1 0.0 151264 26948 ? S 17:37 0:13 nginx: worker process
nginx 9124 0.2 0.0 150544 26456 ? S 17:37 0:26 nginx: worker process
nginx 9125 0.1 0.0 151388 27272 ? S 17:37 0:21 nginx: worker process
nginx 9126 0.1 0.0 150656 26540 ? S 17:37 0:13 nginx: worker process
root 40755 0.0 0.0 221924 1152 pts/0 S+ 21:05 0:00 grep --color=auto nginx
END
Fri Jul 29 21:05:38 BST 2022
[root@hlvlpxfer-http-ebi-002 nginx]#
施加负载后 5 分钟(负载均衡器仅关注一台服务器)
几分钟后,LoadBalancer 开始发现服务器“没有响应”,并开始向客户端发送“连接重置”。负载平衡器是标准硬件,正在检查 robots.txt 文件,由于 nginx 无法快速返回文件,因此会超时。
[root@hlvlpxfer-http-ebi-002 nginx]# sleep 240 ; echo "START"; date ; curl 127.0.0.1/nginx_status ; uptime ; free -m ; ps axu|grep nginx ; echo "END " ; date
START
Fri Jul 29 21:11:45 BST 2022
Active connections: 1588
server accepts handled requests
15069 15069 38953
Reading: 0 Writing: 109 Waiting: 1479
21:13:13 up 4:15, 1 user, load average: 0.15, 0.11, 0.09
total used free shared buff/cache available
Mem: 31797 4264 232 8 27300 27102
Swap: 2047 11 2036
root 9122 0.0 0.0 104148 2344 ? Ss 17:37 0:00 nginx: master process /usr/sbin/nginx
nginx 9123 0.1 0.0 151264 26948 ? S 17:37 0:13 nginx: worker process
nginx 9124 0.2 0.0 150976 26712 ? S 17:37 0:26 nginx: worker process
nginx 9125 0.1 0.0 151388 27272 ? S 17:37 0:22 nginx: worker process
nginx 9126 0.1 0.0 150656 26540 ? S 17:37 0:13 nginx: worker process
root 41198 0.0 0.0 301520 16796 ? S 21:08 0:00 python3 nginx_status.py
root 41343 0.0 0.0 301520 16860 ? S 21:09 0:00 python3 nginx_status.py
root 41492 0.0 0.0 301520 16704 ? S 21:10 0:00 python3 nginx_status.py
root 41927 0.0 0.0 301520 16740 ? S 21:13 0:00 python3 nginx_status.py
root 41933 0.0 0.0 221924 1128 pts/0 S+ 21:13 0:00 grep --color=auto nginx
END
Fri Jul 29 21:13:13 BST 2022
[root@hlvlpxfer-http-ebi-002 nginx]#
谢谢你的时间。
根据要求,虚拟机具有以下设置:
- 20 台虚拟机运行 kuberentes、traefik > nginx
- 3 台直接运行 nginx 的虚拟机(OEL8.5)
- 所有 nginx 在任何地方都有相同的配置。
- 负载均衡器是 F5 ,在所有负载均衡器上进行循环分发。
23 台虚拟机在 3 个 Vmware 虚拟机管理程序上运行,3 台具有 OEL8.5 的虚拟机具有亲和性规则,可在每个虚拟机管理程序上运行一个虚拟机。
Vmware集群不繁忙:
英特尔 (R) 至强 (R) CPU E5-2699 v3 @ 2.30GHz
72 核
虚拟机管理程序的网络似乎不是问题: