Nginx 负载平衡配置不良或行为不良?

Nginx 负载平衡配置不良或行为不良?

我目前正在使用 Nginx 作为负载均衡器,以平衡运行 NodeJS API 的 3 个节点之间的网络流量。

Nginx 实例在节点 1 上运行,每个请求都发送到节点 1。我发现 2 小时内的请求数约为 700k,nginx 配置为以循环方式在节点 1、节点 2 和节点 3 之间切换。以下是conf.d/deva.conf

upstream deva_api {
    server 10.8.0.30:5555 fail_timeout=5s max_fails=3;
    server 10.8.0.40:5555 fail_timeout=5s max_fails=3;
    server localhost:5555;
    keepalive 300;
}

server {

        listen 8000;

        location /log_pages {

                proxy_redirect off;
                proxy_set_header Host $host;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

                proxy_http_version 1.1;
                proxy_set_header Connection "";

                add_header 'Access-Control-Allow-Origin' '*';
                add_header 'Access-Control-Allow-Methods' 'GET, POST, PATCH, PUT, DELETE, OPTIONS';
                add_header 'Access-Control-Allow-Headers' 'Authorization,Content-Type,Origin,X-Auth-Token';
                add_header 'Access-Control-Allow-Credentials' 'true';

                if ($request_method = OPTIONS ) {
                        return 200;
                }

                proxy_pass http://deva_api;
                proxy_set_header Connection "Keep-Alive";
                proxy_set_header Proxy-Connection "Keep-Alive";

                auth_basic "Restricted";                                #For Basic Auth
                auth_basic_user_file /etc/nginx/.htpasswd;  #For Basic Auth
        }
}

下面是nginx.conf配置:

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

worker_rlimit_nofile 65535;
events {
        worker_connections 65535;
        use epoll;
        multi_accept on;
}

http {

        ##
        # Basic Settings
        ##

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 120;
        send_timeout 120;
        types_hash_max_size 2048;
        server_tokens off;

        client_max_body_size 100m;
        client_body_buffer_size  5m;
        client_header_buffer_size 5m;
        large_client_header_buffers 4 1m;

        open_file_cache max=200000 inactive=20s;
        open_file_cache_valid 30s;
        open_file_cache_min_uses 2;
        open_file_cache_errors on;

        reset_timedout_connection on;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # SSL Settings
        ##

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;

        ##
        # Logging Settings
        ##

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Gzip Settings
        ##

        gzip on;
        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
}

问题是,使用此配置,我在 error.log 中收到数百个错误,如下所示:

upstream prematurely closed connection while reading response header from upstream

但仅在 node2 和 node3 上。我已经尝试了以下测试:

  1. 增加每个节点的并发 API 数量(实际上我使用 PM2 作为节点内平衡器)
  2. 删除一个节点以简化 nginx 的工作
  3. 将权重应用到 nginx

没有什么能让结果变得更好。在这些测试中,我注意到只有 2 个远程节点(node2 和 node3)上存在错误,所以我尝试将它们从等式中删除。结果是我不再遇到类似的错误,但我开始遇到 2 个不同的错误:

recv() failed (104: Connection reset by peer) while reading response header from upstream

writev() failed (32: Broken pipe) while sending request to upstream

问题似乎是由于 node1 上缺少 API,API 可能无法在客户端超时之前响应所有入站流量(这是我的猜测)。话虽如此,我增加了 node1 上的并发 API 数量,结果比以前更好,但我仍然收到后 2 个错误,并且我无法再增加 node1 上的并发 API。

那么,问题是,为什么我不能将 nginx 用作所有节点的负载均衡器?我在 nginx 配置中犯了错误吗?还有其他我没有注意到的问题吗?

编辑: 我在 3 个节点之间运行了一些网络测试。节点通过 Openvpn 相互通信:

ping:

node1->node2
PING 10.8.0.40 (10.8.0.40) 56(84) bytes of data.
64 bytes from 10.8.0.40: icmp_seq=1 ttl=64 time=2.85 ms
64 bytes from 10.8.0.40: icmp_seq=2 ttl=64 time=1.85 ms
64 bytes from 10.8.0.40: icmp_seq=3 ttl=64 time=3.17 ms
64 bytes from 10.8.0.40: icmp_seq=4 ttl=64 time=3.21 ms
64 bytes from 10.8.0.40: icmp_seq=5 ttl=64 time=2.68 ms

node1->node2
PING 10.8.0.30 (10.8.0.30) 56(84) bytes of data.
64 bytes from 10.8.0.30: icmp_seq=1 ttl=64 time=2.16 ms
64 bytes from 10.8.0.30: icmp_seq=2 ttl=64 time=3.08 ms
64 bytes from 10.8.0.30: icmp_seq=3 ttl=64 time=10.9 ms
64 bytes from 10.8.0.30: icmp_seq=4 ttl=64 time=3.11 ms
64 bytes from 10.8.0.30: icmp_seq=5 ttl=64 time=3.25 ms

node2->node1
PING 10.8.0.12 (10.8.0.12) 56(84) bytes of data.
64 bytes from 10.8.0.12: icmp_seq=1 ttl=64 time=2.30 ms
64 bytes from 10.8.0.12: icmp_seq=2 ttl=64 time=8.30 ms
64 bytes from 10.8.0.12: icmp_seq=3 ttl=64 time=2.37 ms
64 bytes from 10.8.0.12: icmp_seq=4 ttl=64 time=2.42 ms
64 bytes from 10.8.0.12: icmp_seq=5 ttl=64 time=3.37 ms

node2->node3
PING 10.8.0.40 (10.8.0.40) 56(84) bytes of data.
64 bytes from 10.8.0.40: icmp_seq=1 ttl=64 time=2.86 ms
64 bytes from 10.8.0.40: icmp_seq=2 ttl=64 time=4.01 ms
64 bytes from 10.8.0.40: icmp_seq=3 ttl=64 time=5.37 ms
64 bytes from 10.8.0.40: icmp_seq=4 ttl=64 time=2.78 ms
64 bytes from 10.8.0.40: icmp_seq=5 ttl=64 time=2.87 ms

node3->node1
PING 10.8.0.12 (10.8.0.12) 56(84) bytes of data.
64 bytes from 10.8.0.12: icmp_seq=1 ttl=64 time=8.24 ms
64 bytes from 10.8.0.12: icmp_seq=2 ttl=64 time=2.72 ms
64 bytes from 10.8.0.12: icmp_seq=3 ttl=64 time=2.63 ms
64 bytes from 10.8.0.12: icmp_seq=4 ttl=64 time=2.91 ms
64 bytes from 10.8.0.12: icmp_seq=5 ttl=64 time=3.14 ms

node3->node2
PING 10.8.0.30 (10.8.0.30) 56(84) bytes of data.
64 bytes from 10.8.0.30: icmp_seq=1 ttl=64 time=2.73 ms
64 bytes from 10.8.0.30: icmp_seq=2 ttl=64 time=2.38 ms
64 bytes from 10.8.0.30: icmp_seq=3 ttl=64 time=3.22 ms
64 bytes from 10.8.0.30: icmp_seq=4 ttl=64 time=2.76 ms
64 bytes from 10.8.0.30: icmp_seq=5 ttl=64 time=2.97 ms

通过 IPerf 检查带宽:

node1 -> node2
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   229 MBytes   192 Mbits/sec

node2->node1
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   182 MBytes   152 Mbits/sec

node3->node1
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   160 MBytes   134 Mbits/sec

node3->node2
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   260 MBytes   218 Mbits/sec

node2->node3
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   241 MBytes   202 Mbits/sec

node1->node3
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   187 MBytes   156 Mbits/sec

看来 OpenVPN 隧道存在瓶颈,因为同样的测试通过eth大约 1Gbits。话虽如此,我已经按照这个指南社区.openvpn.net但我得到的带宽只是之前测量的两倍。

我想保持 OpenVPN 处于开启状态,那么是否可以进行其他调整以增加网络带宽或对 nginx 配置进行其他调整以使其正常工作?

答案1

问题是由 OpenVPN 网络速度慢引起的。通过在每台不同的服务器上添加身份验证后路由互联网上的请求,我们将错误次数减少到每天 1-2 次,现在可能是由其他问题引起的。

相关内容