意外的 nginx 正常关闭

2022/09/19 23:31:29 [info] 192250#192250: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:63
2022/09/19 23:31:31 [notice] 192254#192254: signal process started

[ N 2022-09-19 23:31:35.5287 165889/T6 age/Cor/CoreMain.cpp:670 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2022-09-19 23:31:35.5288 165889/T1 age/Cor/CoreMain.cpp:1245 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2022-09-19 23:31:35.5289 165889/Ta Ser/Server.h:901 ]: [ApiServer] Freed 0 spare client objects
[ N 2022-09-19 23:31:35.5289 165889/Ta Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2022-09-19 23:31:35.5289 165889/T6 Ser/Server.h:901 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2022-09-19 23:31:35.5289 165889/T6 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2022-09-19 23:31:35.6368 165889/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished

导致 nginx 服务无法访问,除非手动重启。这种情况每天至少发生一次。


更糟糕的是,crontab -e应该每分钟监控故障
*/1 * * * * /opt/launch-crashed-services.sh > /dev/null 2>


service nginx status | grep 'active (running)' > /dev/null 2>&1

if [ $? != 0 ]
        sudo service nginx restart > /var/log/nginx/relaunch.log  # /dev/null

主要目标是保持 nginx 正常运行:如何监控 cron 作业是否正在触发?

但事实上,nginx 正在接收关闭命令优雅地关于还有:应该遵循哪些调查途径?



     Active: failed (Result: core-dump) since Sun 2022-09-25 05:03:16 UTC; 15min ago
       Docs: man:nginx(8)
    Process: 416872 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
    Process: 416873 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
   Main PID: 416885 (code=dumped, signal=SEGV)
      Tasks: 0 (limit: 2339)
     Memory: 68.4M
     CGroup: /system.slice/nginx.service

Sep 25 04:50:33 sandbox systemd[1]: Starting A high performance web server and a reverse proxy server...
Sep 25 04:50:33 sandbox systemd[1]: Started A high performance web server and a reverse proxy server.
Sep 25 05:03:16 sandbox systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
Sep 25 05:03:16 sandbox systemd[1]: nginx.service: Killing process 417241 (nginx) with signal SIGKILL.
Sep 25 05:03:16 sandbox systemd[1]: nginx.service: Killing process 417241 (nginx) with signal SIGKILL.
Sep 25 05:03:16 sandbox systemd[1]: nginx.service: Failed with result 'core-dump'.

以下是输出nginx -T

# configuration file /etc/nginx/sites-enabled/fidelity:
server {

  server_name sandbox.fdl.club;
  root /home/jerdvo/fidelity/current/public;

  passenger_enabled on;
  passenger_app_env development;

  location /cable {
    passenger_app_group_name fidelity_websocket;
    passenger_force_max_concurrent_requests_per_process 0;

  # Allow uploads up to 100MB in size
  client_max_body_size 100m;

  location ~ ^/(assets|packs) {
    expires max;
    gzip_static on;

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/sandbox.fdl.club/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/sandbox.fdl.club/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

server {
    if ($host = sandbox.fdl.club) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

  listen 80;
  listen [::]:80;

  server_name sandbox.fdl.club;
    return 404; # managed by Certbot


# configuration file /etc/nginx/sites-enabled/market_sandbox:
server {

  server_name provetp.sltfla.online proveat.sltfla.online;
  root /home/jerdvo/market/current/public;

  passenger_enabled on;
  passenger_app_env development;

  location /cable {
    passenger_app_group_name market_websocket;
    passenger_force_max_concurrent_requests_per_process 0;

  # Allow uploads up to 100MB in size
  client_max_body_size 100m;

  location ~ ^/(assets|packs) {
    expires max;
    gzip_static on;

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/provetp.sltfla.online/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/provetp.sltfla.online/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

server {

  listen 80;
  listen [::]:80;

#  server_name prove_tp.sltfla.online prove_at.sltfla.online;

server {
    if ($host = proveat.sltfla.online) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    if ($host = provetp.sltfla.online) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

  server_name provetp.sltfla.online proveat.sltfla.online;
    listen 80;
    return 404; # managed by Certbot

# configuration file /etc/nginx/sites-enabled/simon:
server {

  server_name simon.domayn.com;
  root /home/jerdvo/simon/current/public;

  passenger_enabled on;
  passenger_app_env development;

  location /cable {
    passenger_app_group_name simon_websocket;
    passenger_force_max_concurrent_requests_per_process 0;

  # Allow uploads up to 100MB in size
  client_max_body_size 100m;

  location ~ ^/(assets|packs) {
    expires max;
    gzip_static on;

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/simon.domayn.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/simon.domayn.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

server {
    if ($host = simon.domayn.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

  server_name simon.domayn.com;
    listen 80;
    return 404; # managed by Certbot

更新 2 systemd.service 有一个正在运行的重启命令。
最后一次重启的时间戳为 [Sept28],可通过以下代码片段找到dmesg

[Sep27 23:05] audit: type=1400 audit(1664319928.524:32): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/17029/usr/lib/snapd/snap-co>
[  +0.000538] audit: type=1400 audit(1664319928.524:33): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/17029/usr/lib/snapd/snap-co>
[  +0.027718] audit: type=1400 audit(1664319928.552:34): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.005538] audit: type=1400 audit(1664319928.556:35): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.004074] audit: type=1400 audit(1664319928.560:36): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.003893] audit: type=1400 audit(1664319928.564:37): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.004513] audit: type=1400 audit(1664319928.568:38): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.006500] audit: type=1400 audit(1664319928.576:39): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.004354] audit: type=1400 audit(1664319928.580:40): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[  +0.003309] audit: type=1400 audit(1664319928.584:41): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" n>
[Sep28 00:24] show_signal_msg: 20 callbacks suppressed
[  +0.000008] nginx[56081]: segfault at 10 ip 00007f3350cda593 sp 00007fff52347cc0 error 4 in libperl.so.5.30.0[7f3350c72000+166000]
[  +0.000203] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 >
[Sep30 02:01] nginx[208462]: segfault at 71 ip 00007f7b23ce3593 sp 00007ffede4fee10 error 4 in libperl.so.5.30.0[7f7b23c7b000+166000]
[  +0.000017] Code: 48 89 43 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 40 00 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c3 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49


您的 nginx 中的 Passenger 模块存在一些问题,这可能是由您网站代码的某些部分触发的。

尝试将 nginx 和 Passenger 升级到最新版本,看看它是否变得更加稳定。
