HAProxy 总是返回 502 Bad Gateway,但后端会做出响应

HAProxy 总是返回 502 Bad Gateway,但后端会做出响应

我有一个 HAProxy 2.8.4,用于代理多个 URL 路径和多个不同后端上的多个 HTTPS 服务,还有一个通过 TCP 的 PostgreSQL 集群。这是完整的 haproxy -vvv 输出:

HAProxy version 2.8.4-a4ebf9d 2023/11/17 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.4.html
Running on: Linux 4.18.0-513.5.1.el8_9.x86_64 #1 SMP Fri Sep 29 05:21:10 EDT 2023 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_THREAD=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT +PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=2).
Built with OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-20)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

这是我的配置,省略了不相关的 Postgre 内容和其他 HTTPS URL 路径:

lobal
    log 127.0.0.1 local0
    log 127.0.0.1 local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/haproxy.sock mode 600 level admin
    pidfile /var/run/haproxy.pid
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  dontlognull
    retries 2
    timeout connect 4s
    timeout client  30m
    timeout server  30m
    timeout check   5s

# Temporary detailed logging
   log-format "Client IP:port = [%ci:%cp], Start Time = [%tr], Frontend Name = [%ft], Backend Name = [%b], Backend Server = [%s], Time to receive full request = [%TR ms], Response time = [%Tr ms], Status Code = [%ST], Bytes Read = [%B], Request = [%{+Q}r], Request Body = [%[capture.req.hdr(0)]]"

    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend main
    timeout client 86400000
    bind :443 ssl crt /etc/haproxy/haproxy.crt

    option http-buffer-request
    declare capture request len 40000
    http-request capture req.body id 0

    capture request header origin len 128

    # Many other URL mappings and other use_backend directives omitted here

    acl url_apigee path_beg -i /apigee-connector
    use_backend voda-apigee-conn-be if url_apigee

    default_backend deny_be

# Many other backend definitions omitted here

backend voda-apigee-conn-be
    balance roundrobin

    option httpchk
    http-check send meth GET uri /actuator/health

    server api1 x.x.x.x:8002 check inter 10s fall 3 rise 2 ssl verify none
    server api2 y.y.y.y:8002 check inter 10s fall 3 rise 2 ssl verify none

backend deny_be
    http-request deny

当我直接使用 cURL 调用后端时,我收到 HTTP 200 响应,并且它在内部也执行了预期的操作:

curl -vvv -k -w "@curl-format.txt" -X POST -H "X-API-Key: my-api-key-1" -H "Content-Type: application/json" -d @apigee-conn-email3.json https://x.x.x.x:8002/apigee-connector/outbound-communication
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying x.x.x.x...
* TCP_NODELAY set
* Connected to x.x.x.x (x.x.x.x) port 8002 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: <omitted>
*  start date: Dec  8 07:51:33 2023 GMT
*  expire date: Dec  7 07:51:32 2025 GMT
*  issuer: <omitted>
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> POST /apigee-connector/outbound-communication HTTP/1.1
> Host: x.x.x.x:8002
> User-Agent: curl/7.61.1
> Accept: */*
> X-API-Key: my-api-key-1
> Content-Type: application/json
> Content-Length: 344
>
* upload completely sent off: 344 out of 344 bytes
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/1.1 200
< Server: nginx
< Date: Wed, 28 Feb 2024 16:38:26 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< Expires: 0
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Set-Cookie: JSESSIONID=YWLvqrZAAMHBgbOLK9Q-nz2vrxhs9mKw_Yt92lQ5.cgwcsmfuatapp1; path=/; secure; HttpOnly
< X-XSS-Protection: 1; mode=block
< Pragma: no-cache
< X-Frame-Options: DENY
< X-Content-Type-Options: nosniff
< Strict-Transport-Security: max-age=31536000 ; includeSubDomains
< RequestId: 59d32aad-4294-4811-a385-a4e65d68065f
< Quota-Reset: 1709139600925
< Quota-Allowed: 10000
< Quota-Available: 9998
< Content-Type: application/json
< Transfer-Encoding: chunked
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection #0 to host x.x.x.x left intact
{"id":"94440760","attachment":{"id":{"value":"DOC-20240228-173826-WNHPC"}}}

     time_namelookup:  0.000076s
        time_connect:  0.000414s
     time_appconnect:  0.026246s
    time_pretransfer:  0.026375s
       time_redirect:  0.000000s
  time_starttransfer:  2.132503s
                     ----------
          time_total:  2.133572s

当我通过 HAProxy 调用相同函数时:

curl -vvv -k -w "@curl-format.txt" -X POST -H "X-API-Key: my-api-key-1" -H "Content-Type: application/json" -d @apigee-conn-email3.json https://z.z.z.z/apigee-connector/outbound-communication
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying z.z.z.z...
* TCP_NODELAY set
* Connected to z.z.z.z (z.z.z.z) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: <omitted>
*  start date: Jan 31 20:50:09 2024 GMT
*  expire date: Jan 30 20:50:08 2026 GMT
*  issuer: <omitted>
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> POST /apigee-connector/outbound-communication HTTP/1.1
> Host: z.z.z.z
> User-Agent: curl/7.61.1
> Accept: */*
> X-API-Key: my-api-key-1
> Content-Type: application/json
> Content-Length: 344
>
* upload completely sent off: 344 out of 344 bytes
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* HTTP 1.0, assume close after body
< HTTP/1.0 502 Bad Gateway
< cache-control: no-cache
< content-type: text/html
<
<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

* TLSv1.3 (IN), TLS alert, [no content] (0):
* TLSv1.3 (IN), TLS alert, close notify (256):
* Closing connection 0
* TLSv1.3 (OUT), TLS alert, [no content] (0):
* TLSv1.3 (OUT), TLS alert, close notify (256):

     time_namelookup:  0.000073s
        time_connect:  0.001454s
     time_appconnect:  0.023841s
    time_pretransfer:  0.023926s
       time_redirect:  0.000000s
  time_starttransfer:  2.589714s
                     ----------
          time_total:  2.589899s

HAProxy 日志:

Feb 28 18:01:38 localhost haproxy[3223599]: Client IP:port = [a.a.a.a:34660], Start Time = [28/Feb/2024:18:01:36.068], Frontend Name = [main~], Backend Name = [voda-apigee-conn-be], Backend Server = [api2], Time to receive full request = [0 ms], Response time = [-1 ms], Status Code = [502], Bytes Read = [208], Request = ["POST https://z.z.z.z/apigee-connector/outbound-communication HTTP/2.0"], Request Body = [<JSON body omitted>]

我总是得到 502 Bad Gateway,但是,后端仍然按预期完全执行请求,并且它说它生成了一个响应......至少我在两种情况下在应用程序级别看到完全相同的日志消息。

我注意到 HAProxy 切换到 HTTP/2,我尝试在 HAProxy 配置中使用 alpn 指令强制使用 HTTP 1.1,并且在 cURL 命令行中使用 --http1.1 标志,然后它就是 HTTP 1.1,但仍然是 502 Bad Gateway。

这可能是什么问题?

- - - 更新 - - -

在收到 AlexD 的评论后,我修改了我的记录,基本上添加了所有我能找到的 %Tx 参数,只是添加了人类可读的名称,因为我永远不会知道什么是 Tr、TR 和 Th 等:

log-format "Client IP:port = [%ci:%cp], Start Time = [%tr], Frontend Name = [%ft], Backend Name = [%b], Backend Server = [%s], Active time of the request = [%Ta ms], Time to establish TCP connection to the server = [%Tc ms], SSL handshake time = [%Th ms], Idle time before the HTTP request = [%Ti ms], Time to get the client's request = [%Tq ms], Time to receive full request = [%TR ms], Response time = [%Tr ms], Total session duration time = [%Tt ms], Status Code = [%ST], Bytes Read = [%B], Termination state = [%ts], Request = [%{+Q}r], Request Body = [%[capture.req.hdr(0)]]"

此后的新日志:

Feb 29 08:46:33 localhost haproxy[3388761]: Client IP:port = [10.215.30.29:37666], Start Time = [29/Feb/2024:08:46:31.067], Frontend Name = [main~], Backend Name = [voda-apigee-conn-be], Backend Server = [api1], Active time of the request = [2092 ms], Time to establish TCP connection to the server = [8 ms], SSL handshake time = [20 ms], Idle time before the HTTP request = [0 ms], Time to get the client's request = [20 ms], Time to receive full request = [0 ms], Response time = [-1 ms], Total session duration time = [2112 ms], Status Code = [502], Bytes Read = [208], Termination state = [PH], Request = ["POST /apigee-connector/outbound-communication HTTP/1.1"], Request Body = [<omitted>]

这似乎是有用的信息。终止状态为“PH”,这似乎意味着响应在标头处理期间被阻止。这很奇怪,我认为我们的后端应用程序没有返回任何无效标头,您可以看到上面的直接 cURL 请求响应。我尝试在前端关闭 http-response strict-mode,但这并没有改变行为。我也没有找到有关此“PH”终止状态的更多信息。

----- 更新 2 -----

我设法通过调用 systemctl reload haproxy 启用了管理套接字,不确定为什么需要这样做。但在查询“show errors”后,它显示了以下内容:

[admin@ccaas1t-postgres-t1 ~]$ echo "show errors" | sudo socat stdio /run/haproxy/haproxy.sock
Total events captured on [29/Feb/2024:11:24:00.722] : 1

[29/Feb/2024:11:23:53.236] backend voda-apigee-conn-be (#13): invalid response
  frontend main (#5), server api1 (#1), event #0, src x.x.x.x:60930
  buffer starts at 0 (including 0 out), 15627 free,
  len 757, wraps at 16336, error at position 665
  H1 connection flags 0x80000000, H1 stream flags 0x00004810
  H1 msg state MSG_HDR_L2_LWS(24), H1 msg flags 0x00011654
  H1 chunk len 0 bytes, H1 body len 0 bytes :

  00000  HTTP/1.1 200 \r\n
  00015  Server: nginx\r\n
  00030  Date: Thu, 29 Feb 2024 10:23:53 GMT\r\n
  00067  Transfer-Encoding: chunked\r\n
  00095  Connection: keep-alive\r\n
  00119  Expires: 0\r\n
  00131  Cache-Control: no-cache, no-store, max-age=0, must-revalidate\r\n
  00194  Set-Cookie: JSESSIONID=nGfILjPozmcVahj831vf21a6BXIxgqElGlE8zxqA.cgwcsm
  00264+ fuatapp1; path=/; secure; HttpOnly\r\n
  00300  X-XSS-Protection: 1; mode=block\r\n
  00333  Pragma: no-cache\r\n
  00351  X-Frame-Options: DENY\r\n
  00374  X-Content-Type-Options: nosniff\r\n
  00407  Strict-Transport-Security: max-age=31536000 ; includeSubDomains\r\n
  00472  VFHU-RequestId: 20681405-102a-4eed-8eee-00ed8f3c37c8\r\n
  00526  VFHU-Quota-Reset: 1709204400244\r\n
  00559  VFHU-Quota-Allowed: 10000\r\n
  00586  VFHU-Quota-Available: 9999\r\n
  00614  Content-Type: application/json\r\n
  00646  Transfer-Encoding: chunked\r\n
  00674  \r\n
  00676  4b\r\n
  00680  {"id":"94464170","attachment":{"id":{"value":"DOC-20240229-112353-OTXV
  00750+ J"}}}\r\n

“位置 665 处出错”。位置 665 是“Transfer-Encoding: chunked”标头中的冒号 (:)。这对我来说看起来没问题,块的大小以 4b 十六进制发送,即十进制的 75,这是 JSON 的大小。此外,我们不是手动构建此 HTTP 响应,而是由 Java 库完成的,因此我很确定这应该没问题。

答案1

有两个标题Transfer-Encoding: chunked- 一个在位置00067,另一个在00646HAProxy只需要一个。

相关内容