将 SSL 终止移至边缘会增加 502 错误的发生率

2024-5-31 • tag-icon

将 SSL 终止移至边缘会增加 502 错误的发生率

我们使用 nginx 作为一对上游服务器的负载均衡器/故障转移。我们从 11/1 开始。

这张图表显示了情况。没有点的地方，当天没有 502：

在最初几天，日志显示少量 502 响应代码，可能是由于我们在稳定 nginx 配置时进行了调整或其他活动。然后我们运行了 12 天，没有出现 502 错误（除了 11/13 出现一次小故障 - 可能又是一次调整）。

11 月 20 日，我们将 SSL 终止从上游服务器移至边缘。从那时起，我们每天都会看到 502 错误，而且这个数字似乎在增长（占所有请求的百分比）

昨天，自 11 月 1 日以来我们第一次开始收到客户投诉。

虽然它们在所有流量中只占很小的比例（从未达到 1%）（每天约 50 万个请求），但它们通常会聚集在一起，持续时间约为 10-15 秒。在此期间，许多用户会遇到功能下降或无法访问的情况。

nginx.conf

worker_processes  auto;

events {
    worker_connections  1024;
    use epoll;
    multi_accept on;
}

http {
  include             mime.types;
  default_type        application/octet-stream;
  sendfile            on;
  keepalive_timeout   70;
  keepalive_requests  100000;
  tcp_nopush          on;
  tcp_nodelay         on;

  open_file_cache max=1000 inactive=20s;
  open_file_cache_valid 30s;
  open_file_cache_min_uses 5;
  open_file_cache_errors off;

  gzip on;
  gzip_min_length 1000;
  gzip_types application/x-javascript text/css application/javascript text/javascript text/plain text/xml application/json application/vnd.ms-fontobject application/x-font-opentype application/x-font-truetype application/x-font-ttf application/xml font/eot font/opentype font/otf image/svg+xml image/vnd.microsoft.icon;
  gzip_disable "MSIE [1-6]\.";

  log_format main '$time_iso8601\t$status\t$remote_addr\t$upstream_addr\t$upstream_status\t$scheme\t$request\t$request_time\t$upstream_response_time\t$body_bytes_sent';
  access_log   /var/log/nginx/access.log  main;
  error_log   /var/log/nginx/error.log  error;
  # error_log   /var/log/nginx/error_debug.log

  upstream example {
    server 192.168.1.40:80;
    server 192.168.1.41:80;
  }

  server {
    listen              80;
    listen              443 default ssl;
    server_name         example.com;

#    ssl on;
    ssl_certificate         ssl/example.com.crt;
    ssl_certificate_key     ssl/example.com.key;
    ssl_trusted_certificate ssl/example.com.pem;

    location / {
      proxy_read_timeout      180;
      proxy_pass              http://example;
      proxy_next_upstream     error timeout invalid_header http_500 http_502 http_503 http_504;

      proxy_set_header        Host            $host;
      proxy_set_header        X-Real-IP       $remote_addr;
      proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    }
  }
}

当出现 502 错误时，访问日志会提供以下值：

$status: 502
$upstream_addr: 192.168.1.40, example
$upstream_status: 500, 502

或者

$status: 502
$upstream_addr: example
$upstream_status: 502

或类似的变体。

错误日志显示：

[error] 21293#21293: *2441745 no live upstreams while connecting to upstream

安装详细信息：

Ubuntu 服务器 16.04.3 LTS
nginx 版本：nginx/1.12.2
2 个 CPU 核心 @ 3.00GHz
8 GB 内存
2 个 10 Gbe 网卡
500 GB 硬盘

我的问题：

将证书移至边缘如何增加 502 错误的实例以及我们该如何修复它？
为什么速率会增加？实际负载相当平稳。这是某种泄漏吗？

编辑以添加：

添加 keepalive（感谢@Owen Garret）并没有消除 502。我们今晚会检查它是否可能减少，然后我们可以相应地调整 keepalive 值
与此同时，我们恢复了在 Web 服务器上终止 SSL（直通）。到目前为止，有消除了 502。

答案1

NGINX 生成 502 错误，因为它在需要时无法与上游（您的“ proxy_pass http://example;”配置）建立 http 连接。

首先要检查的是上游服务器。检查服务器错误日志和系统日志，查找可能出现故障的原因。

当您从使用 TCP（流）负载平衡代理 SSL 连接更改为终止 SSL 并与上游建立 http 连接时，问题会变得更严重吗？如果是这样，那么此更改的一个影响是上游可能会处理更频繁的 TCP 连接：

当使用 TCP（流）负载平衡代理 SSL 连接时，连接中的所有请求都将通过同一个代理连接发送到上游。
当终止连接并向上游发出新请求时，NGINX 将默认为每个请求创建一个新的 TCP 连接。

您可以按照说明配置 NGINX 使用与上游的保持连接，从而鼓励 NGINX 保持 TCP 连接打开并在将来的请求中重复使用它们。此更改可能会减少 502 错误的数量。

将以下内容添加到您的位置块与proxy_pass指令一起：

    proxy_http_version 1.1;
    proxy_set_header Connection "";

将以下内容添加到上游组配置中：

    keepalive 20;

请参阅此处了解更多详细信息：https://www.nginx.com/blog/load-balancing-with-nginx-plus-part2/#keepalive

相关内容