我的一台 Cent os 6.7 (2.6.32-573.el6.x86_64) 虚拟机 linux 杀死了所有进程,不仅是应用程序,还有 cron、auditd、httpd 和 mysql。当我查询状态时,它指出 pid 文件存在,但服务已停止。该服务器是mysql集群的sql节点。服务器重新启动后经常发生这种情况,并且经过两次三倍后才开始正常工作。我已启用审核日志,下面是消息日志
kernel: audit: *NO* daemon at audit_pid=17901
kernel: audit: audit_lost=89 audit_rate_limit=0 audit_backlog_limit=320
kernel: audit: auditd dissapeared
kernel: type=1318 audit(1488753001.130:770): opid=19004 oauid=0 ouid=0 oses=51 ocomm="callapi.sh"
kernel: type=1300 audit(1488753001.130:771): arch=c000003e syscall=62 success=yes exit=0 a0=4a52 a1=9 a2=9 a3=4a52 items=0 ppid=19009 pid=19032 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=49 comm="kill" exe="/bin/kill" key="teste_kill"
kernel: type=1318 audit(1488753001.130:771): opid=19026 oauid=0 ouid=0 oses=51 ocomm="callapi.sh"
kernel: type=1300 audit(1488753001.130:772): arch=c000003e syscall=62 success=yes exit=0 a0=46be a1=9 a2=9 a3=46be items=0 ppid=19009 pid=19032 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=49 comm="kill" exe="/bin/kill" key="teste_kill"
kernel: type=1318 audit(1488753001.130:772): opid=18110 oauid=0 ouid=0 oses=44 ocomm="crond" type=1300 audit(1488753001.130:773): arch=c000003e syscall=62 success=yes exit=0 a0=4a34 a1=9 a2=9 a3=4a34 items=0 ppid=19009 pid=19032 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=49 comm="kill" exe="/bin/kill" key="teste_kill"
kernel: type=1318 audit(1488753001.130:773): opid=18996 oauid=0 ouid=0 oses=50 ocomm="crond"
init: tty (/dev/tty1) main process (14691) killed by KILL signal
init: tty (/dev/tty1) main process ended, respawning
init: tty (/dev/tty3) main process (14693) killed by KILL signal
init: tty (/dev/tty3) main process ended, respawning
init: tty (/dev/tty4) main process (14694) killed by KILL signal
init: tty (/dev/tty4) main process ended, respawning
init: tty (/dev/tty5) main process (14695) killed by KILL signal
init: tty (/dev/tty5) main process ended, respawning
init: tty (/dev/tty6) main process (14696) killed by KILL signal
init: tty (/dev/tty6) main process ended, respawning
kernel: imklog 5.8.10, log source = /proc/kmsg started.
kernel: type=1318 audit(1488753001.130:773): opid=18996 oauid=0 ouid=0 oses=50 ocomm="crond"
kernel: type=1300 audit(1488753001.130:774): arch=c000003e syscall=62 success=yes exit=0 a0=4a3b a1=9 a2=9 a3=4a3b items=0 ppid=19009 pid=19032 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=49 comm="kill" exe="/bin/kill" key="teste_kill"
kernel: type=1300 audit(1488754808.281:1069): arch=c000003e syscall=62 success=no exit=-3 a0=4673 a1=0 a2=0 a3=4673 items=0 ppid=1 pid=20268 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=54 comm="java" exe="/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.111.x86_64/jre/bin/java" key="teste_kill"
下面是内存的sar报告。
01:20:01 上午 10535308 5798748 35.50 449520 2901344 3523388 17.16 01:30:01 上午 10529272 5804784 35.54 449520 2902444 3521484 17.1 5 01:40:01 AM 10524924 5809132 35.56 449520 2903496 3521852 17.16 平均值:10531009 5803047 35.53 449520 2897895 3518261 17.14
01:58:39 AM Linux 重启
07:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit 02:10:01 AM 16057300 276756 1.69 30220 58348 187056 0.91 02:20:01 AM 16057316 276740 1.69 30332 58364 187056 0.91 02:30:01 上午 16057192 276864 1.70 30452 58372 187288 0.91
答案1
当内核开始杀死看似随机的进程时,通常是由于系统内存不足(RAM 和交换)引起的。
使用 检查 VM 的内存状态free -h
,然后根据需要进行调整。