了解服务终止的原因

了解服务终止的原因

使用journalctl -u docker我注意到

May 30 10:01:43 xxx systemd[1]: Stopping Docker Application Container Engine...
...
docker specific error log in between
...
May 30 10:01:51 xxx systemd[1]: Stopped Docker Application Container Engine...

我看见/var/log/auth.log也没有尝试任何码头工人整个星期的入场。

未发现终止尝试历史,以及我们共同的用户

systemd入口:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

我甚至不知道为什么它无法重新启动。看起来好像有人手动终止了服务。

据我理解。系统化如果服务因该问题而停止,至少应该尝试重新启动服务。这让我认为这是某人的要求。

如何解决这个问题?

Docker 版本 19.03.8,构建 afacb8b7f0。正常运行时间 28 天。

最近出现了内存泄漏问题,几乎所有东西都被占用了。但我在日志中没有看到有关内存的信息。

/var/log/kern.log 中的 UPD OOM killer (感谢@Abhijith)

May 30 10:01:42 compute03 kernel: [2263822.755824] [ pid ]   uid  tgid 
total_vm      rss pgtables_bytes swapents oom_score_adj name
May 30 10:01:42 compute03 kernel: [2263822.755829] [  404]     0   404    71910        1   540672     3377             0 systemd-journal
May 30 10:01:42 compute03 kernel: [2263822.755830] [  414]     0   414    10905        0   122880      372         -1000 systemd-udevd
May 30 10:01:42 compute03 kernel: [2263822.755831] [  417]     0   417    24427        0    94208       55             0 lvmetad
May 30 10:01:42 compute03 kernel: [2263822.755833] [  606] 62583   606    35484        0   184320      187             0 systemd-timesyn
May 30 10:01:42 compute03 kernel: [2263822.755834] [  655]   100   655    18265        0   167936      385             0 systemd-network
May 30 10:01:42 compute03 kernel: [2263822.755835] [  678]   101   678    17693        0   184320      200             0 systemd-resolve
May 30 10:01:42 compute03 kernel: [2263822.755836] [  890]     0   890    27604       20   118784       64             0 irqbalance
May 30 10:01:42 compute03 kernel: [2263822.755837] [  898]     0   898    17670        0   184320      218             0 systemd-logind
May 30 10:01:42 compute03 kernel: [2263822.755838] [  899]     0   899   169538        0   147456      219             0 lxcfs
May 30 10:01:42 compute03 kernel: [2263822.755839] [  901]   103   901    12544        0   143360      199          -900 dbus-daemon
May 30 10:01:42 compute03 kernel: [2263822.755840] [  905]     0   905     7507        0   102400       72             0 cron
May 30 10:01:42 compute03 kernel: [2263822.755841] [  907]     0   907     7083        0   106496       58             0 atd
May 30 10:01:42 compute03 kernel: [2263822.755842] [  908]     0   908    71588        0   192512      260             0 accounts-daemon
May 30 10:01:42 compute03 kernel: [2263822.755843] [  909]   102   909    65758        0   172032      461             0 rsyslogd
May 30 10:01:42 compute03 kernel: [2263822.755844] [  916]     0   916    42372        0   233472     2022             0 networkd-dispat
May 30 10:01:42 compute03 kernel: [2263822.755845] [  921]     0   921   301259        0   348160     6201             0 containerd
May 30 10:01:42 compute03 kernel: [2263822.755846] [  923]   112   923    26804        0   233472      291             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755847] [  929]     0   929    46488        0   262144     2000             0 unattended-upgr
May 30 10:01:42 compute03 kernel: [2263822.755848] [  931]     0   931   300744      120   495616    12158          -500 dockerd
May 30 10:01:42 compute03 kernel: [2263822.755849] [  944]   112   944    28924        1   262144      307             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755850] [  945]   112   945    29478       11   270336      357             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755852] [  946]   112   946    29478        0   270336      369             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755853] [  947]   112   947    29478       12   270336      355             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755854] [  952]     0   952     3666        0    73728       38             0 agetty
May 30 10:01:42 compute03 kernel: [2263822.755856] [  954]   112   954    27903        0   258048      360             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755857] [  958]     0   958     3722        0    77824       36             0 agetty
May 30 10:01:42 compute03 kernel: [2263822.755858] [  960]     0   960    18075        1   188416      191         -1000 sshd
May 30 10:01:42 compute03 kernel: [2263822.755859] [  961]     0   961    72221        0   212992      274             0 polkitd
May 30 10:01:42 compute03 kernel: [2263822.755860] [ 6213]  1000  6213    19225        0   196608      346             0 systemd
May 30 10:01:42 compute03 kernel: [2263822.755861] [ 6214]  1000  6214    27956        0   245760      614             0 (sd-pam)
May 30 10:01:42 compute03 kernel: [2263822.755862] [ 6307]  1000  6307    63356      313   385024    12640             0 service
May 30 10:01:42 compute03 kernel: [2263822.755863] [ 3600]     0  3600    26925        0    65536      265          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755864] [ 3628]   999  3628   818153   332342  6262784   394513             0 python
May 30 10:01:42 compute03 kernel: [2263822.755865] [ 3703]     0  3703    26925        0    73728      271          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755875] [ 3732]   999  3732   818151   288134  6258688   438719             0 python
May 30 10:01:42 compute03 kernel: [2263822.755876] [ 4172]     0  4172    26925        0    73728      271          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755878] [ 4196]   999  4196   324489    77683  2314240   156754             0 python
May 30 10:01:42 compute03 kernel: [2263822.755879] [ 4332]     0  4332    27277        0    77824      318          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755880] [ 4362]   999  4362   286331   192099  2007040     4441             0 python
May 30 10:01:42 compute03 kernel: [2263822.755881] [ 4431]     0  4431    26925        0    73728      243          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755882] [ 4460]   999  4460   152545    57219   913408     5807             0 python
May 30 10:01:42 compute03 kernel: [2263822.755883] [ 4515]  1000  4515   354203        0   565248    13231             0 service
May 30 10:01:42 compute03 kernel: [2263822.755884] Out of memory: Kill process 3628 (python) score 353 or sacrifice child
May 30 10:01:42 compute03 kernel: [2263822.757606] Killed process 3628 (python) total-vm:3272612kB, anon-rss:1329368kB, file-rss:0kB, shmem-rss:0kB
May 30 10:01:42 compute03 kernel: [2263822.899423] oom_reaper: reaped process 3628 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

正如我所见,docker 有-500 oom 分数但是下一次尝试(30 分钟后)表中却没有出现 docker 的身影。

之前搜索到的所有单词码头工人单词只是信息日志。在开始之前没有错误。

答案1

您是否检查过服务是否因内存问题而被终止。如果系统内存不足,即 RAM 或交换空间已满,Linux out_of_memory 会自动终止进程,请运行以下命令

grep docker /var/log/kern.log

如果不可用,请查看 /var/log/messages

这只是一个假设

相关内容