在我的一台服务器上,mysql 因内存使用量过大而被终止。在它崩溃之前,内存使用量约为 95%。我感到困惑的是,是什么终止了该进程,因为我过去看到,每当进程因 OutOfMemory 而终止时,它都会清楚地记录在 syslog 中。
在日志中我看到该进程已重新启动,但我无法弄清楚哪个进程停止/杀死了它。
来自 journalcntrl 的日志。
Apr 30 19:08:45 ip-10-0-1-4 systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Apr 30 19:08:57 ip-10-0-1-4 systemd[1]: snapd.service: Killing process 24259 (snapd) with signal SIGABRT.
Apr 30 19:09:17 ip-10-0-1-4 systemd[1]: Stopping MySQL Community Server...
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: snapd.service: Main process exited, code=dumped, status=11/SEGV
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: snapd.service: Failed with result 'watchdog'.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: snapd.service: Triggering OnFailure= dependencies.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: Starting Failure handling of the snapd snap...
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: Started Failure handling of the snapd snap.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: snapd.service: Service hold-off time over, scheduling restart.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 13.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: Stopped Snappy daemon.
Apr 30 19:09:18 ip-10-0-1-4 systemd[1]: Starting Snappy daemon...
Apr 30 19:09:18 ip-10-0-1-4 snapd[11960]: AppArmor status: apparmor is enabled and all features are available
Apr 30 19:09:18 ip-10-0-1-4 CRON[11772]: pam_unix(cron:session): session closed for user root
Apr 30 19:09:19 ip-10-0-1-4 snapd[11960]: AppArmor status: apparmor is enabled and all features are available
Apr 30 19:09:19 ip-10-0-1-4 snapd[11960]: daemon.go:343: started snapd/2.44.3 (series 16; classic) ubuntu/18.04 (amd64) linux/4.15.0-1041-aws.
Apr 30 19:09:19 ip-10-0-1-4 snapd[11960]: daemon.go:436: adjusting startup timeout by 40s (pessimistic estimate of 30s plus 5s per snap)
Apr 30 19:09:19 ip-10-0-1-4 systemd[1]: Started Snappy daemon.
Apr 30 19:09:20 ip-10-0-1-4 snapd[11960]: storehelpers.go:438: cannot refresh: snap has no updates available: "amazon-ssm-agent", "core"
Apr 30 19:09:20 ip-10-0-1-4 snapd[11960]: autorefresh.go:397: auto-refresh: all snaps are up-to-date
Apr 30 19:09:21 ip-10-0-1-4 systemd[1]: Stopped MySQL Community Server.
Apr 30 19:09:21 ip-10-0-1-4 systemd[1]: Starting MySQL Community Server...
Apr 30 19:09:22 ip-10-0-1-4 systemd[1]: Started MySQL Community Server.cd
我看到 mysql 进程已由 monit 重新启动。
[UTC Apr 30 19:07:14] error : 'mysql' failed protocol test [MYSQL] at [localhost]:3306 [TCP/IP] -- Error receiving server response -- Resource temporarily unavailable
-- stop/start log
另外,当我看到ps -ef | grep watchdog
它正在运行时,但我找不到命令。
root@ip-10-0-1-4:/var/log# ps -ef | grep watch
root 11 2 0 2019 ? 00:00:33 [watchdog/0]
root 14 2 0 2019 ? 00:00:31 [watchdog/1]
root 20 2 0 2019 ? 00:00:27 [watchdog/2]
root 26 2 0 2019 ? 00:00:28 [watchdog/3]
root 32 2 0 2019 ? 00:00:28 [watchdog/4]
root 38 2 0 2019 ? 00:00:26 [watchdog/5]
root 44 2 0 2019 ? 00:00:26 [watchdog/6]
root 50 2 0 2019 ? 00:00:33 [watchdog/7]
root 75 2 0 2019 ? 00:00:00 [watchdogd]
这是什么流程?我之前没见过。
如果需要任何其他信息,请告诉我。
答案1
看起来您正在运行 Snap 版本的 Mysql?看起来 snapd 达到了一些限制/超时并在到达内核 oom 之前杀死了进程本身。我不认为您在 ps 中看到的看门狗是相关的,它们是 watchdogd 的一部分。这是一个在服务器无响应时会重新启动服务器的系统。检查您是否有一个经常更新的 /dev/watchdog 文件。