为什么 systemd 在重启时挂起?

为什么 systemd 在重启时挂起?

10 次中有 1 次,systemd 在重启期间挂起。我不明白原因。我应该查看什么/哪里来修复该问题?我正在使用 systemd v196,无法将其升级到版本 >=198,因为后者需要较新的内核(支持 cgroups),而客户要求无法更新。我想知道是否有合理的方法来发现此行为的原因并让 systemd 无条件重启系统。

请注意,此链接没有帮助:http://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1

正如你在那里看到的:

关机永无休止

如果等待几分钟后正常重启或关机仍未完成,则上述创建关机日志的方法将无济于事,必须使用其他方法获取日志。两个对调试启动问题有用的选项也可用于关机问题:

use a serial console
use a debug shell - not only is it available from early boot, it also stays active until late shutdown.

我正在使用串行控制台,并且由于某种原因,我甚至可以登录,因为 eth 接口已启动或已启动(在重新启动步骤期间断开连接后)。

我不明白原因何在。

# cat /etc/systemd/system/
basic.target.wants/                          getty.target.wants/                          multi-user.target.wants/                     sysinit.target.wants/                        
dbus-org.freedesktop.NetworkManager.service  local-fs-pre.target.wants/                   sockets.target.wants/                        syslog.service                               
display-manager.service                      local-fs.target.wants/                       swap.target

注意 swap.target 。它在那里,但我们根本不使用交换分区。我试图屏蔽交换,但挂起问题仍然存在。控制台中的最后一行是:

[OK] Stopped target shutdown.

编辑:正如我所说,我可以通过 eth 上的 ssh 重新登录。

现在我给大家展示两个日志,第一个日志是重启/关机挂起时产生的,第二个日志是重启成功时产生的:

挂起案例,输出总是这样的(完整日志):

[  OK  ] Stopped Network Time Service (one-shot ntpdate mode).
         Stopping Modem and VPN connections autoconnect...
         Stopping Login Service...
         Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[  OK  ] Stopped Monitoring free system resources.
[  OK  ] Stopped Monitoring dropbear socket.
[  OK  ] Stopped Login Service.
[  OK  ] Stopped Modem and VPN c[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Serial Getty on ttyO0.
[  OK  ] Unmounted /var/lib/opkg.
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped Suspend manager.
         Stopping X Server...
[  OK  ] Stopped X Server.
         Stopping System Logging Service...
[  OK  ] Stopped System Logging Service.
[   77.580000] g_ether gadget: using random self ethernet address
[   77.580000] g_ether gadget: using random host ethernet address
[   77.590000] usb0: MAC 6e:0d:de:b0:33:4f
[   77.590000] usb0: HOST MAC 62:7a:81:02:f3:ff
[   77.600000] g_ether gadget: Ethernet Gadget, version: Memorial Day 2008
[   77.600000] g_ether gadget: g_ether ready
[   77.610000] musb-hdrc musb-hdrc.0: MUSB HDRC host driver
[   77.610000] musb-hdrc musb-hdrc.0: new USB bus registered, assigned bus number 2
[   77.620000] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   77.630000] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   77.640000] usb usb2: Product: MUSB HDRC host driver
[   77.640000] usb usb2: Manufacturer: Linux 2.6.37 musb-hcd
[   77.650000] usb usb2: SerialNumber: musb-hdrc.0
[   77.650000] hub 2-0:1.0: USB hub found
[   77.660000] hub 2-0:1.0: 1 port detected
[   77.690000] ADDRCONF(NETDEV_UP): usb0: link is not ready
[  OK  ] Stopped target Reboot.
[  OK  ] Stopped Reboot.
[  OK  ] Stopped target Unmount All Filesystems.
[  OK  ] Stopped target Shutdown.
[   78.330000] <46>systemd-journald[328]: Received SIGUSR1
<hang>

正常重启:

         Unmounting /var/lib/opkg...
[  OK  ] Stopped target Network.
         Stopping SSH Per-Connection Server...
[  OK  ] Stopped target Graphical Interface.
[  OK  ] Stopped target Multi-User.
         Stopping Monitoring free system resources...
         Stopping Monitoring dropbear socket...
         Stopping Network Time Service (one-shot ntpdate mode)...
[  OK  ] Stopped Network Time Service (one-shot ntpdate mode).
         Stopping Modem and VPN connections autoconnect...
         Stopping Login Service...
         Stopping LSB: Avahi mDNS/DNS-SD Daemon...
[  OK  ] Stopped Monitoring free system resources.
[  OK  ] Stopped Monitoring dropbear socket.
[  OK  ] Stopped Login Service.
[  OK  ] Unmounted /var/lib/opkg.
         Stopping Network Manager...
[  OK  ] Stopped Getty on tty1.
[  OK  ] Stopped Network Manager.
[  OK  ] Stopped Serial Getty on ttyO0.
[  OK  ] Stopped Suspend manager.
[  OK  ] Stopped LSB: Avahi mDNS/DNS-SD Daemon.
         Stopping D-Bus System Message Bus...
         Stopping X Server...
         Stopping Permit User Sessions...
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped X Server.
[  OK  ] Stopped D-Bus System Message Bus.
         Stopping System Logging Service...
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed dropbear.socket.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped target System Initialization.
         Stopping Import configuration from SD card...
[  OK  ] Stopped Import configuration from SD card.
         Stopping Load Kernel Modules...
         Stopping Apply Kernel Variables...
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped target Local File Systems.
         Unmounting /var...
         Unmounting /tmp...
[  OK  ] Closed Syslog Socket.
[  OK  ] Failed unmounting /var.
[  OK  ] Unmounted /tmp.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped target Local File Systems (Pre).
         Stopping Remount Root and Kernel File Systems...
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target Shutdown.
[   52.340000] omap_wdt: Unexpected close, not stopping!
Sending SIGTERM to remaining processes...
[   52.490000] <46>systemd-journald[335]: Received SIGTERM
Sending SIGKILL to remaining processes...
Unmounting file systems.
Unmounting /sys/fs/fuse/connections.
Unmounting /var.
All filesystems unmounted.
Deactivating swaps.
All swaps deactivated.

更新:

经过一番调查和调试,我发现了关机中断的原因,尽管我仍然无法解决它。发生的情况是,由于某些原因,其中一个自定义服务在关机完成之前启动,这导致关机程序挂起。这是挂起的一种情况。另一种挂起是关机没有中断但在某个时刻停止。出于这个原因,在逐一解决所有冲突和其他可能的挂起之前,我想无条件激活硬件看门狗。要通过 systemd 执行此操作,我已经启用并测试了 RuntimeWatchdogSec 和 ShutdownWatchdogSec(单独或一起)。不幸的是,它们没有帮助。通过查看源代码,似乎 systemd 进入了一个循环,它仍然等待所有 fs 被卸载并执行其他类型的清理,然后才让看门狗真正生效(不让它保持活动状态)。

我被难住了。我想请你找到一种方法来:1. 启用看门狗无条件地至少从关机开始的地方开始 2. 轻松检测并解决所有冲突

优先考虑第一个解决方案。

答案1

我冒昧地提出一个解决方案:尝试添加

  Before=basic.target

到 /usr/lib/systemd/system/dbus.service。

你的日志中有一个奇怪的现象让我很惊讶,让我想起了前段时间读到的一起事故,在 Arch Linux 论坛中:此系统在重启时会挂起。解决方案如上所述,理由是挂起可能是由于某个服务在停止后尝试与 d-bus 通信而导致的:

因此,通过将其排在 basic.target 之前,它不仅可以在达到基本目标之前启动,而且还确保它在关闭期间一直保持到 basic.target 关闭之后。

在你的不良日志中,我们实际上看到基本系统没有停止,但它在健康日志。

如果这不起作用,并且考虑到您无法升级,您是否考虑过降级?

答案2

shutdown.target默认情况下与所有其他单元发生冲突,以便在关机过程启动时自动停止它们。反之亦然——如果另一个单元启动,则会导致shutdown.target停止。因此问题是某物导致关机期间某些操作启动,从而覆盖关机过程。

这应该在 systemd v198 中得到修复,这使得关闭作业“不可替代”。

答案3

也许在达到“目标关闭”时交换仍然处于活动状态;我的解决方案是在重新启动之前强制停用交换:

swapoff -a
swapoff /dev/md6

此后,重启对我来说一切顺利,没有任何停顿。

相关内容