使用 systemd 重新加载后,nginx 有时会被终止

使用 systemd 重新加载后,nginx 有时会被终止

将我的 Ubuntu 服务器从 18.04 升级到 20.04 后,nginx 启动时出现奇怪的行为。它死机了随机systemctl reload nginx抱怨已经有进程在监听给定的端口。在我能够运行 portmap 或类似程序之前,我已经能够systemctl start nginx毫无问题地启动 nginx,所以我认为问题是 nginx 试图绑定到已被重新加载的旧 nginx 占用的端口。

我没有尝试太多,因为我迷路了。我检查了包含/run/nginx.pid正确的 pid。我的服务器上有 docker,所以我想也许某个容器开始尝试绑定 80 或 443,但没有。docker 的唯一映射端口是 8090。

版本:

nginx -v
nginx version: nginx/1.18.0 (Ubuntu)

cat /etc/os-release 
VERSION="20.04.1 LTS (Focal Fossa)"
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
...

配置文件有效:

nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

nginx 绑定的所有端口都与日志中抱怨的相同:

cat /etc/nginx/sites-enabled/* | grep listen | uniq
listen 80 default_server;
listen 127.0.0.1:8080;
listen 443 ssl http2;
listen [::]:443 ssl http2;

你知道为什么 nginx 会在重新加载时死机吗?我还能检查什么?

生产的原木

dmesg

[31326.529427] traps: nginx[746] general protection fault ip:7f5f8cadc593 sp:7ffd350738d0 error:0 in libperl.so.5.30.0[7f5f8ca74000+166000]
[31408.549262] traps: nginx[26366] general protection fault ip:7efda22e8593 sp:7ffda51c0bd0 error:0 in libperl.so.5.30.0[7efda2280000+166000]
[32103.236557] nginx[26433]: segfault at 3d1 ip 00007efe600ce5c9 sp 00007ffead1b3210 error 4 in libperl.so.5.30.0[7efe60066000+166000]
[32103.236566] Code: 00 0f b6 40 30 49 c1 ed 03 49 29 c5 0f 84 17 01 00 00 48 8b 76 10 48 8b 52 10 4c 8d 3c fe 4c 8d 0c c2 84 c9 0f 84 c7 02 00 00 <49> 83 39 00 0f 85 ad 03 00 00 49 83 c1 08 49 83 ed 01 49 8d 74 1d
[32676.779937] nginx[31927]: segfault at 10 ip 00007f7550de2593 sp 00007ffce0bf4cd0 error 4 in libperl.so.5.30.0[7f7550d7a000+166000]
[32676.779952] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49
[33337.193774] traps: nginx[32415] general protection fault ip:7f195aa5e593 sp:7ffca566bad0 error:0 in libperl.so.5.30.0[7f195a9f6000+166000]
[40155.333210] nginx[39879]: segfault at 41 ip 00007fe42f53c593 sp 00007ffe812e18f0 error 4 in libperl.so.5.30.0[7fe42f4d4000+166000]
[40155.333219] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49

杂志

Oct 28 01:32:24 fooServer systemd[1]: Starting A high performance web server and a reverse proxy server...
Oct 28 01:32:25 fooServer systemd[1]: Started A high performance web server and a reverse proxy server.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Main process exited, code=killed, status=11/SEGV
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 587 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2048 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2049 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2050 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2051 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2052 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2053 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2054 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2055 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 587 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2048 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2049 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2050 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2051 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2052 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2053 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2054 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Killing process 2055 (nginx) with signal SIGKILL.
Oct 28 01:35:05 fooServer systemd[1]: nginx.service: Failed with result 'signal'.
Nov 02 05:32:05 fooServer systemd[1]: Starting A high performance web server and a reverse proxy server...
Nov 02 05:32:05 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:05 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:05 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:05 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:05 fooServer nginx[415078]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:06 fooServer nginx[415078]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:07 fooServer nginx[415078]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:08 fooServer nginx[415078]: nginx: [emerg] still could not bind()
Nov 02 05:32:08 fooServer systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Nov 02 05:32:08 fooServer systemd[1]: nginx.service: Failed with result 'exit-code'.
Nov 02 05:32:08 fooServer systemd[1]: Failed to start A high performance web server and a reverse proxy server.
Nov 02 05:32:17 fooServer systemd[1]: Starting A high performance web server and a reverse proxy server...
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:17 fooServer nginx[415100]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:18 fooServer nginx[415100]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] bind() to 0.0.0.0:443 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] bind() to [::]:443 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] bind() to 127.0.0.1:8080 failed (98: Address already in use)
Nov 02 05:32:19 fooServer nginx[415100]: nginx: [emerg] still could not bind()
Nov 02 05:32:19 fooServer systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Nov 02 05:32:19 fooServer systemd[1]: nginx.service: Failed with result 'exit-code'.
Nov 02 05:32:19 fooServer systemd[1]: Failed to start A high performance web server and a reverse proxy server.

/var/log/nginx/error.log

2020/11/02 21:49:38 [info] 25842#25842: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 21:49:40 [notice] 25846#25846: signal process started
2020/11/02 21:51:09 [notice] 26353#26353: signal process started
2020/11/02 21:51:19 [info] 26357#26357: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 21:51:28 [info] 26364#26364: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 21:52:31 [notice] 26406#26406: signal process started
2020/11/02 21:52:31 [notice] 26418#26418: signal process started
2020/11/02 21:52:38 [info] 26431#26431: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 21:58:54 [info] 26584#26584: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 21:58:57 [notice] 26589#26589: signal process started
2020/11/02 22:04:04 [info] 31847#31847: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 22:04:06 [notice] 31855#31855: signal process started
2020/11/02 22:06:05 [info] 31925#31925: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 22:12:39 [notice] 32321#32321: signal process started
2020/11/02 22:13:40 [notice] 32392#32392: signal process started
2020/11/02 22:13:53 [info] 32413#32413: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74
2020/11/02 22:24:13 [notice] 39657#39657: signal process started
2020/11/02 22:24:40 [notice] 39837#39837: signal process started
2020/11/02 22:24:45 [info] 39870#39870: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:74

配置

/etc/nginx/nginx.conf

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
    worker_connections 768;
    # multi_accept on;
}

http {
    ##
    # Basic Settings
    ##

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;
    server_tokens off;
    client_max_body_size 1G;

    # server_names_hash_bucket_size 64;
    # server_name_in_redirect off;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # SSL Settings
    ##

    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # about 40000 sessions
    ssl_session_tickets off;

    # curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam.pem
    ssl_dhparam /etc/nginx/ssl/dhparam.pem;

    # intermediate configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    # HSTS (ngx_http_headers_module is required) (63072000 seconds)
    add_header Strict-Transport-Security "max-age=63072000" always;

    ##
    # Logging Settings
    ##

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;

    ##
    # Virtual Host Configs
    ##

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

我的大部分/etc/nginx/sites-enabled/*.conf看起来像这样:

  listen 443 ssl http2;
  server_name example.com;

  root /var/www/public;

  include fpm7.3.conf; # includes fastcgi_pass to php-fpm for *.php files

  ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
  ssl_trusted_certificate /etc/letsencrypt/live/example.com/fullchain.pem;

  error_log /var/log/nginx/example.com.error.log;
  access_log /var/log/nginx/example.com.access.log;
}

答案1

你遇到了perl 5.30 中的一个错误,Ubuntu 20.04 LTS 中使用的版本。上游已修复该错误,但该修复尚未反向移植到 Ubuntu。

如果您不需要在 nginx 中运行 Perl 代码(大多数人不需要),那么您可以卸载该包libnginx-mod-http-perl并重新启动 nginx 以避免出现此问题。此包由虚拟包引入nginx-extras,但大多数人实际上并不在 Web 服务器中运行 perl,因此不需要它。

ubuntu@vmtest-ubuntu2004:~$ sudo apt purge libnginx-mod-http-perl
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libhiredis0.14 libluajit-5.1-2 libluajit-5.1-common
  libnginx-mod-http-auth-pam libnginx-mod-http-cache-purge
  libnginx-mod-http-dav-ext libnginx-mod-http-echo
  libnginx-mod-http-fancyindex libnginx-mod-http-geoip
  libnginx-mod-http-geoip2 libnginx-mod-http-headers-more-filter
  libnginx-mod-http-lua libnginx-mod-http-ndk libnginx-mod-http-subs-filter
  libnginx-mod-http-uploadprogress libnginx-mod-http-upstream-fair
  libnginx-mod-nchan
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  nginx-core
Suggested packages:
  nginx-doc
The following packages will be REMOVED:
  libnginx-mod-http-perl* nginx-extras*
The following NEW packages will be installed:
  nginx-core
0 upgraded, 1 newly installed, 2 to remove and 0 not upgraded.
Need to get 425 kB of archives.
After this operation, 173 kB disk space will be freed.
Do you want to continue? [Y/n]

ubuntu@vmtest-ubuntu2004:~$ sudo systemctl restart nginx

虽然在这种特定情况下,nginx 可能已经成为僵尸,您将不得不重新启动计算机。

如上所示,删除它将替换nginx-extrasnginx-core,并将所有额外包标记为符合自动删除的条件,因此如果您确实需要其中任何包,您应该在自动删除之前将它们标记为已安装。

ubuntu@vmtest-ubuntu2004:~$ sudo apt-mark install libnginx-mod-http-geoip2
Selected libnginx-mod-http-geoip2 for installation.

你也可以在 Launchpad 上关注问题

答案2

如果删除libnginx-mod-http-perl对您来说不是一个选择,那么下一个最好的解决方法是配置 systemd 以在失败时自动重新启动 nginx。

这很容易实现:

创建目录/etc/systemd/system/nginx.service.d/

创建/etc/systemd/system/nginx.service.d/override.conf包含以下内容的文件:

[Unit]
StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
Restart=on-failure
RestartSec=5s

跑步systemctl daemon-reload

重启 nginxsystemctl restart nginx.service

根据我的经验,Nginx 通常在第二次重新加载后崩溃。如果现在发生这种情况,systemd 将在 Nginx 失败五秒后启动它。

您可以根据需要自定义设置:

StartLimitBurst定义在 Nginx 失败后,systemd 将连续尝试启动 Nginx 多少次。
StartLimitIntervalSec定义在再次尝试重新启动之前必须经过多长时间。
RestartSec定义在 Nginx 失败后尝试启动它之前要等待多长时间。

答案3

如果删除libnginx-mod-http-perl对您来说不是一个选择,那么下一个最好的解决方法是禁用 perl 模块。

mkdir /etc/nginx/_modules-disabled
mv /etc/nginx/modules-enabled/50-mod-http-perl.conf /etc/nginx/_modules-disabled/

nginx优雅地重启

mkdir /root/tools/
touch /root/tools/nginx_restart.sh
chmod +x /root/tools/nginx_restart.sh
nano /root/tools/nginx_restart.sh
#!/bin/bash

for command in quit stop; do
  if ps aux | grep -v 'grep ' | grep -v "${0}" | grep -q nginx; then
    echo;
    echo "nginx -s $command";
    nginx -s $command;
    for (( i=1 ; i <= 300 ; i=((i+1)) )); do
      sleep 1;
      echo -n ".";
      if ! ps aux | grep -v 'grep ' | grep -v "${0}" | grep -q nginx; then
        break 2;
      fi;
    done;
  fi;
done;

echo;
sleep 1;

if ! ps aux | grep -v 'grep ' | grep -v "${0}" | grep -q nginx; then
  /etc/init.d/nginx start;
  sleep 1;
fi;

/etc/init.d/nginx status;
/root/tools/nginx_restart.sh

相关内容