我遇到了一个非常奇怪的问题。我的邮件服务器上的某些进程(但不是全部)会定期死机(大约每两个月一次)。死机的进程包括:
- 远程控制
- 鸽舍
- 后缀
不会消亡的进程包括:
- 阿帕奇2
我的系统正在运行(Debian Wheezy):
$ uname -a
Linux hostname 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2+deb7u2 x86_64 GNU/Linux
我查看过/var/log
文件,但事件发生后,它们似乎都很平静,而事件总是在早上 6 点 25 分发生。
一开始我以为它和ntpdate
每日 cron 有关,所以我删除了它,并用ntpd
不需要 cron 的程序替换了它。这有帮助吗?没有。
然后我想这肯定有什么问题syslogd
。似乎所有死掉的进程都在尝试使用 syslog 进行记录。我搜索了一番,但没发现有人遇到和我一样的问题。当你的记录机制不起作用时,很难找到问题所在!
以下是事件发生时(6:25)修改的所有日志文件。此后没有日志,所有日志记录活动都停止了!如果您发现可能导致进程终止或日志记录停止的事件,请查看。
/var/log/syslog
Feb 16 06:25:01 hostname /USR/SBIN/CRON[32606]: (root) CMD (/usr/local/ispconfig/server/server.sh 2>&1 > /dev/null | while read line; do echo `/bin/date` "$line" >> /var/log/ispconfig/cron.log; done)
Feb 16 06:25:01 hostname /USR/SBIN/CRON[32607]: (getmail) CMD (/usr/local/bin/run-getmail.sh > /dev/null 2>> /dev/null)
Feb 16 06:25:01 hostname /USR/SBIN/CRON[32608]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB>
Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1]
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361)
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0
Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1]
/var/log/php5-fpm.log
[09-Feb-2014 06:25:07] NOTICE: error log file re-opened
[16-Feb-2014 06:25:06] NOTICE: Terminating ...
[16-Feb-2014 06:25:07] NOTICE: exiting, bye-bye!
/var/log/mail.log
Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB>
Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1]
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361)
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0
Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1]
/var/log/mail.info
Feb 16 06:25:02 hostname dovecot: imap-login: Disconnected (disconnected before greeting, waited 0 secs): user=<>, rip=127.0.0.1, lip=127.0.0.1, secured, session=<v9PKQn/y+gB/AAAB>
Feb 16 06:25:02 hostname postfix/smtpd[32647]: connect from localhost[127.0.0.1]
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [milter][end][connect][stop][0.000481](37362): milter-greylist
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [session][end][connect][accept][0.09962](37361)
Feb 16 06:25:02 hostname milter-manager[2855]: [statistics] [sessions][finished] 18681(+1) 0
Feb 16 06:25:02 hostname postfix/smtpd[32647]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 16 06:25:02 hostname postfix/smtpd[32647]: disconnect from localhost[127.0.0.1]
/var/log/fail2ban.log
2014-02-16 06:25:06,899 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:07,271 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log
2014-02-16 06:25:07,275 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:07,279 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:07,281 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:07,283 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log
2014-02-16 06:25:07,269 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/other_vhosts_access.log
2014-02-16 06:25:07,287 fail2ban.server : INFO Stopping all jails
2014-02-16 06:25:07,719 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:08,461 fail2ban.jail : INFO Jail 'php-url-fopen' stopped
2014-02-16 06:25:08,595 fail2ban.actions: WARNING [apache-w00tw00t] Unban 178.32.243.78
2014-02-16 06:25:08,702 fail2ban.actions: WARNING [apache-w00tw00t] Unban 83.212.122.172
2014-02-16 06:25:09,270 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:09,283 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:09,285 fail2ban.jail : INFO Jail 'apache-w00tw00t' stopped
2014-02-16 06:25:09,298 fail2ban.filter : INFO Log rotation detected for /var/log/apache2/error.log
2014-02-16 06:25:10,325 fail2ban.jail : INFO Jail 'apache-noscript' stopped
2014-02-16 06:25:11,361 fail2ban.jail : INFO Jail 'pam-generic' stopped
2014-02-16 06:25:12,330 fail2ban.jail : INFO Jail 'apache-badbots' stopped
2014-02-16 06:25:13,294 fail2ban.jail : INFO Jail 'apache-nohome' stopped
2014-02-16 06:25:14,326 fail2ban.jail : INFO Jail 'ssh-ddos' stopped
2014-02-16 06:25:14,827 fail2ban.jail : INFO Jail 'exim' stopped
2014-02-16 06:25:15,393 fail2ban.jail : INFO Jail 'webmin' stopped
2014-02-16 06:25:16,330 fail2ban.jail : INFO Jail 'apache' stopped
2014-02-16 06:25:17,296 fail2ban.jail : INFO Jail 'ssh' stopped
2014-02-16 06:25:18,285 fail2ban.jail : INFO Jail 'apache-overflows' stopped
2014-02-16 06:25:18,504 fail2ban.jail : INFO Jail 'dovecot' stopped
2014-02-16 06:25:19,333 fail2ban.jail : INFO Jail 'squirrelmail' stopped
2014-02-16 06:25:20,335 fail2ban.jail : INFO Jail 'apache-myadmin' stopped
2014-02-16 06:25:20,336 fail2ban.server : INFO Exiting Fail2ban
/var/log/auth.log
Feb 16 06:25:01 hostname CRON[32604]: pam_unix(cron:session): session opened for user root by (uid=0)
Feb 16 06:25:01 hostname CRON[32605]: pam_unix(cron:session): session opened for user getmail by (uid=0)
Feb 16 06:25:01 hostname CRON[32603]: pam_unix(cron:session): session opened for user root by (uid=0)
Feb 16 06:25:01 hostname CRON[32605]: pam_unix(cron:session): session closed for user getmail
Feb 16 06:25:02 hostname CRON[32604]: pam_unix(cron:session): session closed for user root
答案1
首先,每隔几个月,您的机器就会在早上 6:25 执行一些奇怪的操作。我会查看所有 cron 作业。
然后,如果一切看起来都不是假的,请尝试将您的问题与内核日志关联起来。发布dmesg
并查找内存耗尽问题,在这种情况下,内核将终止进程,以避免可能产生恐慌的情况。
另外,仔细观察/var/log/ispconfig/cron.log
如果你怀疑有人未经授权访问你的邮箱,请检查/usr/local/ispconfig/server/server.sh
PS:我也会尝试找出第一次出现此问题的时间,然后查找在此之前所做的修改
编辑:
我注意到你的最后一条评论,编写一个简单的 shell 脚本来获取这些作业运行时的内存使用情况将非常有用。
例子
#!/bin/sh
somefile="/your/file/path"
date >>$SomeFile
free -m >>$SomeFile
编辑 cronjobs,并在消耗内存的作业之前和之后运行几秒钟,然后比较结果。这应该可以帮助您决定何时升级内存、修改软件配置等。
PS:如您所见,这是一个基本脚本,但作为起点它是可用的。您可以进一步改进它