我正在运行 Linode 的 Ubuntu 16.04.3 LTS 机器,它的利用率很低,但正常运行时间监视器告诉我,我的网站在恢复之前已经停机了近一个小时。我检查后发现服务器重新启动了,那时网站才恢复。收到了 Linode 发来的一封电子邮件Host initiated restart
。Linode 内部设置的高使用率阈值警报也没有触发。
我正在尝试弄清楚发生了什么。我在另一个装有 Linode 的 Ubuntu 机器上看到了一个问题,Linode 支持人员告诉我,某种原因导致 Linode 崩溃,而 Lassie(他们的看门狗)重新启动了它,这似乎正是这里发生的事情。
我检查了 和/var/log/auth.log
,/var/log/syslog
但它们似乎缺少停机时间窗口之间的日志条目18:03
。没有这样的消息脱颖而出。我的服务器上18:57
没有日志。/var/log/messages
内容/var/log/syslog
:
Feb 23 18:03:04 localhost alertyo-engine[6279]: Un-Setting flag
Feb 23 18:03:04 localhost alertyo-engine[6279]: Alloc = 1 MiB#011TotalAlloc = 2470 MiB#011HeapAlloc = 1 MiB#011Sys = 10 MiB#011NumGC = 10856
Feb 23 18:57:14 localhost rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="3304" x-info="http://www.rsyslog.com"] start
Feb 23 18:57:14 localhost rsyslogd-2222: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]
Feb 23 18:57:14 localhost rsyslogd: rsyslogd's groupid changed to 108
Feb 23 18:57:14 localhost rsyslogd: rsyslogd's userid changed to 104
Feb 23 18:57:14 localhost systemd[1]: Mounted FUSE Control File System.
内容/var/log/auth.log
:
Feb 23 18:03:01 localhost CRON[29814]: pam_unix(cron:session): session closed for user root
Feb 23 18:03:01 localhost CRON[29813]: pam_unix(cron:session): session closed for user ashfame
Feb 23 18:57:14 localhost CRON[3301]: pam_unix(cron:session): session opened for user ashfame by (uid=0)
Feb 23 18:57:15 localhost systemd-logind[3312]: Watching system buttons on /dev/input/event0 (Power Button)
Feb 23 18:57:15 localhost systemd-logind[3312]: New seat seat0.
Feb 23 18:57:15 localhost sshd[3449]: Server listening on 0.0.0.0 port 22.
Feb 23 18:57:15 localhost sshd[3449]: Server listening on :: port 22.
Feb 23 18:57:16 localhost CRON[3301]: pam_unix(cron:session): session closed for user ashfame
Feb 23 18:58:01 localhost CRON[3681]: pam_unix(cron:session): session opened for user root by (uid=0)
Feb 23 18:58:01 localhost CRON[3680]: pam_unix(cron:session): session opened for user ashfame by (uid=0)
Feb 23 18:58:01 localhost CRON[3681]: pam_unix(cron:session): session closed for user root
Feb 23 18:59:01 localhost CRON[3787]: pam_unix(cron:session): session opened for user root by (uid=0)
Feb 23 18:59:01 localhost CRON[3786]: pam_unix(cron:session): session opened for user ashfame by (uid=0)
Feb 23 18:59:01 localhost CRON[3787]: pam_unix(cron:session): session closed for user root
Feb 23 18:59:01 localhost CRON[3786]: pam_unix(cron:session): session closed for user ashfame
我还能检查什么?如果这是一个重复出现的问题,我可能会设置更多的日志记录来找出问题所在,但就像上次一样(在另一个盒子上),我担心这是几个月一次的事件。我如何找出发生了什么,而不是为再次发生做准备?
答案1
刚刚了解到这是由于 Linode 的 Fermont 数据中心电源故障引起的。
因此,如果您在服务器日志中看不到此类问题,则原因之一可能是服务器刚刚关闭,因此日志中没有任何内容(但我记得读过一些系统可以显示的内容)。
检查你的提供商的状态页面和推特搜索总是一个好主意:)