简而言之
重启通常需要不到 1 分钟,但如果clamav-daemon
正在运行,则重启将需要整整 30 分钟。
概述
我有一个 Ubuntu 15.10“云”邮件服务器:postfix、dovecot、mysql,通过 SSH 访问。它已经运行了数周,并且在 1 分钟内重新启动。一切都很顺利,直到我安装了阿马维斯通过 Ubuntu 的 apt-get 安装包 - 现在每次重启只需 30 分钟。
运行时,一切正常。我检查了日志,没有发现明显问题。Clamav 正在运行,邮件发送和接收,CPU 使用率在 0% 到 0.5% 之间波动。
我的理论
我通过排除法将问题缩小到clamav-daemon
最小范围。唯一新安装的是 Amavis,它添加了新的守护进程。如果我停止所有守护进程(amavis
、clamav-daemon
和clamav-freshclam
),则重新启动是即时的。如果我只停止clamav-daemon
,则重新启动是即时的。但如果我让它继续运行并停止所有其他守护进程,则重新启动又需要 30 分钟。
这是完全可重复的,并且等待时间总是恰好 30 分钟。这让我认为 Clamav 正在等待用户输入,超时时间为 30 分钟。
我可以看到 Clamav 的进程正在运行,并且带有“ --foreground=true
”。这让我认为 Amavis 故意这样做是为了捕获输出,但并不期望在重启期间出现提示。
我的日志
注意:/etc/init.d/sendsigs
已report_unkillable
启用捕获无法终止的进程,并且/etc/default/apport
已启用。登录时/var/crash
没有任何最新信息。
(大约在 14:51 重新启动,并且服务器直到 15:23 - 30 分钟后才准备好 SSH,其间没有错误。)
/var/log/clamav/clamav.log
...
14:52:45 2016 -> --- Stopped at Fri 14:52:45 2016
15:23:04 2016 -> +++ Started at Fri 15:23:04 2016
15:23:04 2016 -> Received 1 file descriptor(s) from systemd.
15:23:04 2016 -> clamd daemon 0.98.7 (OS: linux-gnu, ARCH: x86_64, CPU: x86_64)
15:23:04 2016 -> Running as user clamav (UID 119, GID 127)
...
/var/log/mail.log
...
14:51:15 mail amavis[1578]: Using primary internal av scanner code for ClamAV-clamd
14:51:15 mail amavis[1578]: Found secondary av scanner ClamAV-clamscan at /usr/bin/clamscan
14:51:15 mail amavis[1578]: Deleting db files __db.001,nanny.db,__db.003,snmp.db,__db.002 in /var/lib/amavis/db
14:51:15 mail amavis[1578]: Creating db in /var/lib/amavis/db/; BerkeleyDB 0.55, libdb 5.3
15:23:05 mail postgrey[820]: Process Backgrounded
15:23:05 mail postgrey[820]: 2016/02/12-15:23:05 postgrey (type Net::Server::Multiplex) starting! pid(820)
15:23:05 mail postgrey[820]: Resolved [localhost]:10023 to [::1]:10023, IPv6
15:23:05 mail postgrey[820]: Resolved [localhost]:10023 to [127.0.0.1]:10023, IPv4
...
/var/log/syslog
...
14:52:18 mail systemd[1]: Stopping ClamAV virus database updater...
14:52:18 mail freshclam[760]: Update process terminated
14:52:18 mail systemd[1]: Stopped ClamAV virus database updater.
14:52:42 mail systemd[1]: Stopped Setup Virtual Console.
14:52:42 mail systemd[1]: Stopping LVM2 PV scan on device 202:2...
14:52:42 mail systemd[1]: Deactivating swap /dev/disk/by-uuid/758b48d5-df40-4ca8-af0d-a482044d21dd...
14:52:42 mail systemd[1]: Stopping Authenticate and Authorize Users to Run Privileged Tasks...
14:52:42 mail systemd[1]: Stopping User Manager for UID 1000...
14:52:42 mail systemd[1907]: Reached target Shutdown.
14:52:42 mail swapoff[2140]: swapoff: /dev/mapper/ubuntu-swap: swapoff failed: Cannot allocate memory
14:52:42 mail systemd[1907]: Starting Exit the Session...
14:52:42 mail systemd[1907]: Stopped target Default.
14:52:42 mail systemd[1]: Stopping Session 1 of user gavannon.
14:52:42 mail systemd[1907]: Stopped target Basic System.
14:52:42 mail systemd[1907]: Stopped target Sockets.
14:52:42 mail systemd[1]: Stopped target Graphical Interface.
14:52:42 mail systemd[1907]: Stopped target Paths.
14:52:42 mail systemd[1907]: Stopped target Timers.
14:52:42 mail systemd[1]: Removed slice system-ifup.slice.
14:52:42 mail systemd[1]: Removed slice system-systemd\x2dfsck.slice.
14:52:42 mail systemd[1]: Stopped target Timers.
14:52:42 mail systemd[1]: Stopping Accounts Service...
14:52:42 mail systemd[1]: Stopped target Multi-User System.
14:52:42 mail systemd[1]: Stopped Set Cloud Password.
14:52:42 mail systemd[1]: Stopping LSB: Record successful boot for GRUB...
14:52:42 mail systemd[1]: Stopping LSB: Starts amavisd-new mailfilter...
14:52:42 mail systemd[1]: Stopping OpenBSD Secure Shell server...
14:52:42 mail systemd[1]: Stopped target Login Prompts.
14:52:42 mail systemd[1]: Stopping Getty on tty1...
14:52:42 mail systemd[1]: Stopping LSB: XenServer Virtual Machine daemon providing host integration services...
14:52:42 mail rsyslogd: [origin software="rsyslogd" swVersion="8.12.0" x-pid="729" x-info="http://www.rsyslog.com"] exiting on signal 15.
(Something must be happening here, because the next line isn't for 30 minutes)
15:23:04 mail rsyslogd: [origin software="rsyslogd" swVersion="8.12.0" x-pid="770" x-info="http://www.rsyslog.com"] start
15:23:04 mail rsyslogd-2222: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ $
15:23:04 mail rsyslogd: rsyslogd's groupid changed to 109
14:52 发生的最后一件事是日志关闭(rsyslogd
显示exiting
)。下一行是日志在启动时再次启动,但那是 30 分钟后!
答案1
我在 CentOS 7 服务器上遇到了类似的问题(重新启动也需要大约 30 分钟),我的日志与您的问题的共同点是条目swapoff failed: Cannot allocate memory
。显然,在关机过程中,当没有足够的可用 RAM 将交换空间中的剩余数据加载回 RAM 时,交换空间被禁用。运行较少的服务(即不运行clamav-daemon
)可能会避免这种情况,这就是您当时看不到问题的原因。
https://bugzilla.redhat.com/show_bug.cgi?id=1031158解释了为什么在关机期间禁用交换空间。解决方案可能是确保在关机期间禁用交换空间之前,占用最多内存的东西已经停止。这可能可以通过在 systemd 单元中指定依赖项来解决,但我还没有弄清楚。
另一个解决方案当然是添加更多 RAM,这样交换空间就不会使用太多。:)