所以在过去的 5 个小时左右我一直为此而疯狂。
我们的一台服务器出现问题,突然出现 504 网关超时错误。
第一个版本:
- Debian 10 在 Proxmox 7 上运行
- Nginx 1.14
- PHP 7.2.34 (39+0~20230609.84+debian10~1.gbpf63844)
- curl 7.64.0
因此基本上我在一个虚拟主机上有一个以下文件,我们称之为 VH 1 ( example.com/sd.php ):
<?php
echo 'Response';
在另一个虚拟主机( VH 2 example1.com/sd.php )上:
$host= 'https://example.com/sd.php';
for ( $i = 0; $i<2; $i++) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$season_data = curl_exec($ch);
if (curl_errno($ch)) {
print_r(curl_error($ch));
print "Error: " . curl_error($ch);
exit();
}
curl_close($ch);
echo "Responded" . $i."<br />";
}
它们都在同一台机器上。
现在,如果我进入浏览器并输入 example1.com/sd.php 并每 [5-9] 次点击刷新按钮,我就会收到 504 网关超时。
这是我在 VH1 日志上得到的信息:
2023/07/29 21:16:04 [error] 27138#27138: *851 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example.com, request: "GET /sd.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "example.com"
PHP-FPM 完全正常,从未出现过问题(我以为这是最初的问题):
[29-Jul-2023 21:26:32.662531] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:33.663691] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
[29-Jul-2023 21:26:34.664941] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 0 active children, 25 spare children, 25 running children. Spawning rate 1
php-fpm 值得注意的配置:
listen = /run/php/php7.2-fpm.sock
request_terminate_timeout = 30s
以前它使用的是 TCP 套接字,但是我今天下午更改了它,没有任何反应。
我启用了慢速日志,他基本上抱怨 sd.php 文件:
[29-Jul-2023 21:23:24.315874] [pool www] pid 15840
script_filename = /var/www/html/vh2/public/sd.php
[0x00007fb8fd21e110] curl_exec() (...)/vh1/public/sd.php:18
在 nginx 日志上的 VH2 上我什么都没有,错误日志已通过调试启用,当我收到 504 网关超时时没有任何记录。
在 nginx 上,这是我的配置:
client_body_timeout 10;
client_header_timeout 10;
keepalive_timeout 30;
send_timeout 10;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
fastcgi_send_timeout 600s;
fastcgi_read_timeout 600s;
fastcgi_temp_file_write_size 256k;
我还使用 ab 在我的计算机上以及在服务器内部进行了一些测试,一切都很完美(调用 example.com/sd.php ):
tio ~ $ ab -n 5000 -c 100 https://example.com/sd.php
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking example.com (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Completed 5000 requests
Finished 5000 requests
Server Software: pp7-2
Server Hostname: example.com
Server Port: 443
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key: X25519 253 bits
TLS Server Name: example.com
Document Path: /sd.php
Document Length: 5 bytes
Concurrency Level: 100
Time taken for tests: 9.910 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 1940000 bytes
HTML transferred: 25000 bytes
Requests per second: 504.54 [#/sec] (mean)
Time per request: 198.200 [ms] (mean)
Time per request: 1.982 [ms] (mean, across all concurrent requests)
Transfer rate: 191.17 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 113 136 23.5 133 452
Processing: 41 57 20.3 52 425
Waiting: 41 56 19.5 52 424
Total: 161 192 34.0 186 594
Percentage of the requests served within a certain time (ms)
50% 186
66% 192
75% 196
80% 199
90% 208
95% 219
98% 261
99% 334
100% 594 (longest request)
我还检查了 dmesg 和 /var/log/messages 以防万一,但什么也没有。
另一个例子(这里有些不对),检查了这个链接:https://groups.google.com/g/highload-php-en/c/qGu3Eaifj9s,有人为了简单的用例添加了这个 php:
<?php
function connect($host, $port, $timeout = 1) {
$conn_str = "tcp://{$host}:{$port}";
$opts = STREAM_CLIENT_CONNECT | STREAM_CLIENT_ASYNC_CONNECT | STREAM_CLIENT_PERSISTENT;
$sock = stream_socket_client($conn_str, $errno, $errstr, $timeout, $opts);
return $sock;
}
$sock = connect("google.com", 80);
$req = "GET / HTTP/1.0\r\nHost: www.google.com\r\nAccept: */*\r\n\r\n";
$len = fwrite($sock, $req);
$data = stream_get_contents($sock);
echo $data;
刷新页面几次后,同样的事情再次发生:
2023/07/29 21:52:34 [error] 27138#27138: *35680 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:53:12 [info] 27138#27138: *35925 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too (104: Connection reset by peer) while sending request to upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
2023/07/29 21:54:14 [error] 27138#27138: *36093 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 95.94.77.55, server: example1.com, request: "GET /sd2.php HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.2-fpm.sock:", host: "www.example1.com"
以及随之而来的 php-fpm 日志:
[29-Jul-2023 21:53:54.440363] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 2 active children, 27 spare children, 29 running children. Spawning rate 1
[29-Jul-2023 21:53:54.510612] WARNING: pid 15827, fpm_request_check_timed_out(), line 278: [pool www] child 16562, script '/var/www/html/vh2//public/sd2.php' (request: "GET /sd2.php") execution timed out (42.699082 sec), terminating
[29-Jul-2023 21:53:54.512034] DEBUG: pid 15827, fpm_got_signal(), line 75: received SIGCHLD
[29-Jul-2023 21:53:54.512087] WARNING: pid 15827, fpm_children_bury(), line 256: [pool www] child 16562 exited on signal 15 (SIGTERM) after 702.886545 seconds from start
[29-Jul-2023 21:53:54.512861] NOTICE: pid 15827, fpm_children_make(), line 425: [pool www] child 17005 started
[29-Jul-2023 21:53:54.512895] DEBUG: pid 15827, fpm_event_loop(), line 423: event module triggered 1 events
[29-Jul-2023 21:53:55.441977] DEBUG: pid 15827, fpm_pctl_perform_idle_server_maintenance(), line 378: [pool www] currently 1 active children, 28 spare children, 29 running children. Spawning rate 1
我不明白这是怎么回事,感觉就像 curl + php-fpm + nginx 之间的一个奇怪的错误。谢谢大家!
答案1
最终我发现了问题所在。是 DNS 解析。
我正在使用 Symfony,而 symfony 使用 curl_multi_exec,现在由于某种原因,它尝试解析 IPV6 和 IPV4,而在我们的例子中,IPV6 的解析就是导致问题的原因。
有两种方法可以解决这个问题,第一种也是最简单的方法是禁用 curl 中的 IPV6 解析,操作如下:
curl_setopt($c, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4);
另一种方法是更改DNS服务器(并且该服务器无法解析IPV6,并立即返回)。
我使用了第二个版本,因为我正在使用的当前版本的 Symfony 还无法将选项传递给 curl。