Nginx 和 PHP-FPM 504 网关超时,多个 curl 请求

Nginx 和 PHP-FPM 504 网关超时,多个 curl 请求

所以在过去的 5 个小时左右我一直为此而疯狂。

我们的一台服务器出现问题,突然出现 504 网关超时错误。

第一个版本:

  • Debian 10 在 Proxmox 7 上运行
  • Nginx 1.14
  • PHP 7.2.34 (39+0~20230609.84+debian10~1.gbpf63844)
  • curl 7.64.0

因此基本上我在一个虚拟主机上有一个以下文件,我们称之为 VH 1 ( example.com/sd.php ):

<?php
echo 'Response';

在另一个虚拟主机( VH 2 example1.com/sd.php )上:

$host= 'https://example.com/sd.php';

for ( $i = 0; $i<2; $i++) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $host);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $season_data = curl_exec($ch);

    if (curl_errno($ch)) {
        print_r(curl_error($ch));
        print "Error: " . curl_error($ch);
        exit();
    }

    curl_close($ch);
    echo "Responded" . $i."<br />";
}

它们都在同一台机器上。

现在,如果我进入浏览器并输入 example1.com/sd.php 并每 [5-9] 次点击刷新按钮,我就会收到 504 网关超时。

这是我在 VH1 日志上得到的信息:

2023/07/29 21:16:04 [error] 27138#27138: *851 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example.com, request: "GET /sd.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "example.com"

PHP-FPM 完全正常,从未出现过问题(我以为这是最初的问题):

[29-Jul-2023 21:26:32.662531] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:33.663691] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:34.664941] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1

php-fpm 值得注意的配置:

listen = /run/php/php7.2-fpm.sock
request_terminate_timeout = 30s

以前它使用的是 TCP 套接字,但是我今天下午更改了它,没有任何反应。

我启用了慢速日志,他基本上抱怨 sd.php 文件:

[29-Jul-2023 21:23:24.315874]  [pool www] pid 15840
script_filename = /var/www/html/vh2/public/sd.php
[0x00007fb8fd21e110] curl_exec() (...)/vh1/public/sd.php:18

在 nginx 日志上的 VH2 上我什么都没有,错误日志已通过调试启用,当我收到 504 网关超时时没有任何记录。

在 nginx 上,这是我的配置:

client_body_timeout   10;
client_header_timeout 10;
keepalive_timeout     30;
send_timeout          10;

proxy_connect_timeout  600s;
proxy_send_timeout  600s;
proxy_read_timeout  600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
fastcgi_temp_file_write_size 256k;

我还使用 ab 在我的计算机上以及在服务器内部进行了一些测试,一切都很完美(调用 example.com/sd.php ):

tio ~  $ ab -n 5000 -c 100 https://example.com/sd.php          
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking example.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests


Server Software:        pp7-2
Server Hostname:        example.com
Server Port:            443
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        example.com

Document Path:          /sd.php
Document Length:        5 bytes

Concurrency Level:      100
Time taken for tests:   9.910 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1940000 bytes
HTML transferred:       25000 bytes
Requests per second:    504.54 [#/sec] (mean)
Time per request:       198.200 [ms] (mean)
Time per request:       1.982 [ms] (mean, across all concurrent requests)
Transfer rate:          191.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      113  136  23.5    133     452
Processing:    41   57  20.3     52     425
Waiting:       41   56  19.5     52     424
Total:        161  192  34.0    186     594

Percentage of the requests served within a certain time (ms)
  50%    186
  66%    192
  75%    196
  80%    199
  90%    208
  95%    219
  98%    261
  99%    334
 100%    594 (longest request)

我还检查了 dmesg 和 /var/log/messages 以防万一,但什么也没有。

另一个例子(这里有些不对),检查了这个链接:https://groups.google.com/g/highload-php-en/c/qGu3Eaifj9s,有人为了简单的用例添加了这个 php:

<?php

function connect($host, $port, $timeout = 1) {
    $conn_str = "tcp://{$host}:{$port}";
    $opts = STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT | STREAM_CLIENT_PERSISTENT;
    $sock = stream_socket_client($conn_str, $errno, $errstr, $timeout, $opts);
    return $sock;
}

$sock = connect("google.com", 80);

$req = "GET / HTTP/1.0\r\nHost: www.google.com\r\nAccept: */*\r\n\r\n";

$len = fwrite($sock, $req);

$data = stream_get_contents($sock);

echo $data;

刷新页面几次后,同样的事情再次发生:

2023/07/29 21:52:34 [error] 27138#27138: *35680 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:53:12 [info] 27138#27138: *35925 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too (104: Connection reset by peer) while sending request to upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:54:14 [error] 27138#27138: *36093 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"

以及随之而来的 php-fpm 日志:

[29-Jul-2023 21:53:54.440363] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 2 active children, 27 spare children, 29 running children. Spawning rate 1
[29-Jul-2023 21:53:54.510612] WARNING: pid 15827, fpm_request_check_timed_out(), line 278: [pool www] child 16562, script '/var/www/html/vh2//public/sd2.php' (request: "GET /sd2.php") execution timed out (42.699082 sec), terminating
[29-Jul-2023 21:53:54.512034] DEBUG: pid 15827, fpm_got_signal(), line 75: received SIGCHLD
[29-Jul-2023 21:53:54.512087] WARNING: pid 15827, fpm_children_bury(), line 256: [pool www] child 16562 exited on signal 15 (SIGTERM) after 702.886545 seconds from start
[29-Jul-2023 21:53:54.512861] NOTICE: pid 15827, fpm_children_make(), line 425: [pool www] child 17005 started
[29-Jul-2023 21:53:54.512895] DEBUG: pid 15827, fpm_event_loop(), line 423: event module triggered 1 events
[29-Jul-2023 21:53:55.441977] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 1 active children, 28 spare children, 29 running children. Spawning rate 1

我不明白这是怎么回事,感觉就像 curl + php-fpm + nginx 之间的一个奇怪的错误。谢谢大家!

答案1

最终我发现了问题所在。是 DNS 解析。

我正在使用 Symfony,而 symfony 使用 curl_multi_exec,现在由于某种原因,它尝试解析 IPV6 和 IPV4,而在我们的例子中,IPV6 的解析就是导致问题的原因。

有两种方法可以解决这个问题,第一种也是最简单的方法是禁用 curl 中的 IPV6 解析,操作如下:

curl_setopt($c, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);

另一种方法是更改​​DNS服务器(并且该服务器无法解析IPV6,并立即返回)。

我使用了第二个版本,因为我正在使用的当前版本的 Symfony 还无法将选项传递给 curl。

相关内容