我有一台运行在虚拟机上的 centos6.9,今天早上它变得疯狂了。它突然开始重新启动。一开始,重启间隔正好是 10 分钟,然后缩小到 5 分钟,然后是 3 分钟,现在有时会有所不同。以下是来自 /var/log/messages 的消息。
May 10 18:40:01 hwmaster01 init: tty (/dev/tty1) main process (2126) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty2) main process (2128) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty3) main process (2130) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty4) main process (2132) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty5) main process (2134) killed by TERM signal
May 10 18:40:01 hwmaster01 init: tty (/dev/tty6) main process (2136) killed by TERM signal
May 10 18:40:07 hwmaster01 ntpd[1767]: ntpd exiting on signal 15
May 10 18:40:08 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
*一段时间后
May 10 18:45:02 hwmaster01 init: tty (/dev/tty1) main process (2137) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty2) main process (2139) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty3) main process (2141) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty4) main process (2143) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty5) main process (2146) killed by TERM signal
May 10 18:45:02 hwmaster01 init: tty (/dev/tty6) main process (2148) killed by TERM signal
May 10 18:45:08 hwmaster01 ntpd[1772]: ntpd exiting on signal 15
May 10 18:45:08 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
*一段时间后
May 10 18:52:01 hwmaster01 init: tty (/dev/tty1) main process (2124) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty2) main process (2126) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty3) main process (2128) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty4) main process (2131) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty5) main process (2133) killed by TERM signal
May 10 18:52:01 hwmaster01 init: tty (/dev/tty6) main process (2135) killed by TERM signal
May 10 18:52:09 hwmaster01 ntpd[1767]: ntpd exiting on signal 15
May 10 18:52:10 hwmaster01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
没有新的压力工具在运行。它是 hadoop 集群环境中的主节点,其中有 4 个节点位于不同的虚拟机上,但位于相同的硬件上。所有虚拟机在硬件级别上似乎都工作正常,但该主节点崩溃并停止了所有服务。有人熟悉这个问题吗?
答案1
您可以附加strace
到该主流程。它会告诉您它是通过哪个进程被杀死的。