HAProxy 502 中的服务器超时和重试

2024-5-30 • tag-icon

我们遇到了一个奇怪的 HA 代理问题。虽然我们解决了这个问题，但我们并不明白为什么会这样。

设置非常简单：一个负载平衡器和一个 Apache 后端。我们的问题出现在一个 http 请求上，该请求在 Apache 上运行 100 分钟后才会返回任何输出。

在第一次设置中，我们花了timeout server30 分钟。在这种情况下，HAProxy 出现了 504 错误，日志中显示sH--。根据文档：

在服务器返回其响应标头之前发生“超时服务器”故障。

因此我们将服务器超时时间延长至 60 分钟。这次我们（100 分钟后）收到 502 错误，日志中显示SH--：

服务器在发送完整的 HTTP 响应标头之前中止，或者在处理请求时崩溃。

起初我们怀疑 Apache 崩溃了，但后来我们注意到 60 分钟后 Apache 收到了来自 HAProxy 的第二个请求，但该请求未显示在任何日志文件中。是否发生了以下情况：

请求发送到 HAProxy，然后从那里发送到 Apache
60 分钟后，HA Proxy 认为连接不良并重新发送请求（我猜是使用不同的源端口）。Apache 开始处理第二个请求。
40 分钟后，Apache 完成第一次尝试并将结果发送回 HAProxy。HAProxy 感到困惑，因为这是第一次尝试的响应，因此它以 502 中止并记录重试次数 0（如我们在日志中看到的那样）。

顺便说一句，只有我们将超时服务器延长至 2 小时后，请求才能无错误处理（也不会重试）。

剥离的 HAProxy 1.5.9 配置：

global
   maxconn         4000     # Sets the maximum per-process number of concurrent connections.
   maxsslconn      1000     # Sets the maximum per-process number of concurrent SSL connections.
   maxcompcpuusage 95       # Sets the maximum CPU usage HAProxy can reach before stopping or decreasing the compression level.

defaults HTTP
   mode            http
   option          http-server-close        # Preserve client persistent connections while handling every incoming request individually, dispatching them one after another to servers, in HTTP close mode
   option          httplog
   option          forwardfor
   timeout connect 4s       
   timeout client  20s      
   timeout server  100s
   timeout http-request 20s  # Set the maximum allowed time to wait for a complete HTTP request
   maxconn         200
   default-server  inter 2s fall 6 rise 2 port 80

frontend dc2--fe
   bind            8.8.8.8:80 mss 1422
   default_backend dc2--active

backend dc2--active
   timeout server  7200s
   server          app37 10.0.0.97:80 check

相关内容