在 Debian 10 上分析关机期间“停止作业正在运行”1 分 30 秒

在 Debian 10 上分析关机期间“停止作业正在运行”1 分 30 秒

我已经安装了 Debian 10 桌面版。我注意到关机经常延迟,延迟 1 分 30 秒。我在 Debian 9 安装上没有遇到这个问题。 (或者至少,不那么频繁)。

之后我查看系统日志时也可以看到延迟(journalctl)。我想分析这些日志。

在一种情况下,问题似乎出在用户服务上pulseaudio.service。如果是这样,则可以很容易地解决这个问题,例如通过设置TimeoutStopSec=2服务。 (注意这不是系统服务,它是用户服务。没有系统服务pulseaudio.service)。

在我看到的第二种情况下,pulseaudio.service似乎一被询问就关闭了。如果是这样,设置超时将无济于事:-)。相反,在 systemd 甚至要求pulseaudio.service停止之前似乎存在很长的延迟。当我查看第二个日志时,我无法弄清楚问题出在哪里。

有一个更广泛的解决方法可能有效(见结尾)。但我想具体知道问题出在哪里。

问题

  1. 下面的第二条日志是否显示了特定问题,即我未能识别的问题?
  2. 是否存在任何预期或已知的情况,其中软件错误可能导致以下日志,并且日志无法识别具体问题?

我当前的systemd软件包版本是241-5.

更新

我认为这个问题听起来与问题#12262。该问题的解决方法是“cgroup-util:也杀死线程”公关#12621),合并到 systemd 版本 243 中。

PR 链接中的讨论表明该错误特定于 cgroupsv2(“统一控制组层次结构”)。 Debian 10 使用 cgroupsv2,而 Debian 9 则不使用。更具体地说,Debian 10 使用过渡性的“混合”层次结构

因此我尝试添加systemd.legacy_systemd_cgroup_controller到内核启动选项。使用此选项时,我可以看到cgroup2文件系统没有安装在任何地方,仅安装了cgroup.可悲的是,我仍然看到这种无法解释的关闭延迟。

完整日志:日志关闭延迟2.txt

Sep 08 16:24:25 drystone kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-5-amd64 root=UUID=f10753fb-0a2d-4288-971d-5f33682ab5c0 ro systemd.legacy_systemd_cgroup_controller quiet

Sep 08 16:54:28 drystone systemd[1]: Stopped Firmware update daemon.
Sep 08 16:55:58 drystone systemd[1442]: Stopping D-Bus User Message Bus...
Sep 08 16:55:58 drystone systemd[1442]: Stopped target Default.
Sep 08 16:55:58 drystone systemd[1442]: Stopping Sound Service...
Sep 08 16:55:58 drystone systemd[1442]: pulseaudio.service: Succeeded.
Sep 08 16:55:58 drystone systemd[1442]: Stopped Sound Service.
Sep 08 16:55:58 drystone systemd[1442]: dbus.service: Main process exited, code=killed, status=15/TERM
Sep 08 16:55:58 drystone systemd[1442]: dbus.service: Succeeded.
Sep 08 16:55:58 drystone systemd[1442]: Stopped D-Bus User Message Bus.
Sep 08 16:55:58 drystone systemd[1442]: Stopped target Basic System.
Sep 08 16:55:58 drystone systemd[1442]: Stopped target Paths.
Sep 08 16:55:58 drystone systemd[1442]: Stopped target Timers.

另外根据 PR 讨论,替代修复方法是使用 Linux 内核版本 5.2 或更高版本。内核修复提交是“cgroup:在 PROCS 迭代中包含带有活动线程的垂死领导者”。这也已包含在稳定内核系列 4.19.x 中。这稳定提交日期为 2019 年 6 月 10 日。然而,在撰写本文时,当前的 Debian 10 内核是基于 4.19.67,已发布2019-04-27左右。所以这个 Debian 内核还没有包含修复程序。


1.pulseaudio关闭延迟

Aug 31 08:40:28 drystone systemd[1862]: Stopped target Default.
Aug 31 08:40:28 drystone systemd[1862]: Stopping Sound Service...
Aug 31 08:40:28 drystone systemd[1]: Stopping User Manager for UID 1005...

...

Aug 31 08:41:57 drystone systemd[1862]: Stopping D-Bus User Message Bus...
Aug 31 08:41:57 drystone systemd[1862]: pulseaudio.service: Succeeded.
Aug 31 08:41:57 drystone systemd[1862]: Stopped Sound Service.
Aug 31 08:41:57 drystone systemd[1862]: dbus.service: Main process
exited, code=killed, status=15/TERM
Aug 31 08:41:57 drystone systemd[1862]: dbus.service: Succeeded.
Aug 31 08:41:57 drystone systemd[1862]: Stopped D-Bus User Message Bus.
Aug 31 08:41:57 drystone systemd[1862]: Stopped target Basic System.

...

Aug 31 08:41:57 drystone systemd[1862]: Reached target Shutdown.
Aug 31 08:41:57 drystone systemd[1862]: systemd-exit.service: Succeeded.
Aug 31 08:41:57 drystone systemd[1862]: Started Exit the Session.
Aug 31 08:41:57 drystone systemd[1862]: Reached target Exit the Session.
Aug 31 08:41:57 drystone systemd[1863]:
pam_unix(systemd-user:session): session closed for user mike

...

Aug 31 08:41:57 drystone systemd[1]: [email protected]: Succeeded.
Aug 31 08:41:57 drystone systemd[1]: Stopped User Manager for UID 1005.

2. 不明原因的关机延迟

完整日志:日志关闭延迟.txt

Sep 03 11:42:47 drystone systemd[1]: Stopping User Manager for UID 1001...

...

Sep 03 11:42:47 drystone systemd[1]: Stopped Login Service.
Sep 03 11:44:16 drystone systemd[1359]: Stopping D-Bus User Message Bus...
Sep 03 11:44:16 drystone systemd[1359]: Stopping Sound Service...
Sep 03 11:44:16 drystone systemd[1359]: pulseaudio.service: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Stopped Sound Service.
Sep 03 11:44:16 drystone systemd[1359]: dbus.service: Main process exited, code=killed, status=15/TERM
Sep 03 11:44:16 drystone systemd[1359]: dbus.service: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Stopped D-Bus User Message Bus.
Sep 03 11:44:16 drystone systemd[1359]: Stopped target Basic System.
Sep 03 11:44:16 drystone systemd[1359]: Stopped target Sockets.
Sep 03 11:44:16 drystone systemd[1359]: gpg-agent.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed GnuPG cryptographic agent and passphrase cache.
Sep 03 11:44:16 drystone systemd[1359]: gpg-agent-ssh.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Sep 03 11:44:16 drystone systemd[1359]: dbus.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed D-Bus User Message Bus Socket.
Sep 03 11:44:16 drystone systemd[1359]: gpg-agent-browser.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Sep 03 11:44:16 drystone systemd[1359]: gpg-agent-extra.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Sep 03 11:44:16 drystone systemd[1359]: dirmngr.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed GnuPG network certificate management daemon.
Sep 03 11:44:16 drystone systemd[1359]: Stopped target Timers.
Sep 03 11:44:16 drystone systemd[1359]: Stopped target Paths.
Sep 03 11:44:16 drystone systemd[1359]: pulseaudio.socket: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Closed Sound System.
Sep 03 11:44:16 drystone systemd[1359]: Reached target Shutdown.
Sep 03 11:44:16 drystone systemd[1359]: systemd-exit.service: Succeeded.
Sep 03 11:44:16 drystone systemd[1359]: Started Exit the Session.
Sep 03 11:44:16 drystone systemd[1359]: Reached target Exit the Session.
Sep 03 11:44:16 drystone systemd[1360]: pam_unix(systemd-user:session): session closed for user alan-sysop
Sep 03 11:44:16 drystone systemd[1]: [email protected]: Succeeded.
Sep 03 11:44:16 drystone systemd[1]: Stopped User Manager for UID 1001.

披露:此链接日志还包括 13 行systemd[1]: Received SIGINT. 这些是我按 Ctrl+Alt+Del 时写的。据我所知,除了切换普利茅斯靴子飞溅之外,这没有任何效果。第一条此类消息是在系统挂起 30 秒后出现的。所以我认为我们可以忽略这一点。

我找不到会延迟pulseaudio.service的排序依赖项

没有服务文件对以下项有明确的排序依赖性pulseaudio.service

$ grep -r pulseaudio.service /usr/lib/systemd/user/
$

反之亦然:

$ systemctl --user cat pulseaudio.service
# /usr/lib/systemd/user/pulseaudio.service
[Unit]
Description=Sound Service

# We require pulseaudio.socket to be active before starting the daemon, because
# while it is possible to use the service without the socket, it is not clear
# why it would be desirable.
#
# A user installing pulseaudio and doing `systemctl --user start pulseaudio`
# will not get the socket started, which might be confusing and problematic if
# the server is to be restarted later on, as the client autospawn feature
# might kick in. Also, a start of the socket unit will fail, adding to the
# confusion.
#
# After=pulseaudio.socket is not needed, as it is already implicit in the
# socket-service relationship, see systemd.socket(5).
Requires=pulseaudio.socket
ConditionUser=!root

[Service]
# Note that notify will only work if --daemonize=no
Type=notify
ExecStart=/usr/bin/pulseaudio --daemonize=no
Restart=on-failure

[Install]
Also=pulseaudio.socket
WantedBy=default.target

我目前尝试的解决方法

diff --git a/systemd/system/[email protected]/override.conf b/systemd/system/[email protected]/override.conf
new file mode 100644
index 0000000..86734ef
--- /dev/null
+++ b/systemd/system/[email protected]/override.conf
@@ -0,0 +1,2 @@
+[Service]
+TimeoutStopSec=7s
diff --git a/systemd/user.conf b/systemd/user.conf
index b427f1e..cb5b646 100644
--- a/systemd/user.conf
+++ b/systemd/user.conf
@@ -21,7 +21,7 @@
 #DefaultStandardOutput=inherit
 #DefaultStandardError=inherit
 #DefaultTimeoutStartSec=90s
-#DefaultTimeoutStopSec=90s
+DefaultTimeoutStopSec=5s
 #DefaultRestartSec=100ms
 #DefaultStartLimitIntervalSec=10s
 #DefaultStartLimitBurst=5

答案1

我想具体知道问题出在哪里。

在我应用了广泛的解决方法之后,当这个问题发生时,systemd说这个问题特定于 - systemd:-)。

具体而言,系统管理员被迫终止“用户管理器”。我看到“用户管理器”没有等待的子进程。否则,日志就会显示他们也被杀死了。

当然,bug 仍然有可能不在 systemd 中,而是在内核中。

Sep 13 20:21:58 drystone systemd[1]: Stopping User Manager for UID 1005...
Sep 13 20:21:58 drystone systemd[1531]: Stopping Sound Service...
Sep 13 20:22:05 drystone systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.
Sep 13 20:22:05 drystone systemd[1]: [email protected]: Killing process 1531 (systemd) with signal SIGKILL.
Sep 13 20:22:05 drystone systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL
Sep 13 20:22:05 drystone systemd[1]: [email protected]: Failed with result 'timeout'.
Sep 13 20:22:05 drystone systemd[1]: Stopped User Manager for UID 1005.

这是一个pulseaudio也被杀死的例子。这条日志也很奇怪。系统管理员要求用户管理器停止,但用户管理器不要求任何用户单元停止。

Sep 20 19:47:46 drystone systemd[1]: Stopping User Manager for UID 1005...
Sep 20 19:47:53 drystone systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.
Sep 20 19:47:53 drystone systemd[1]: [email protected]: Killing process 1459 (systemd) with signal SIGKILL.
Sep 20 19:47:53 drystone systemd[1]: [email protected]: Killing process 1551 (pulseaudio) with signal SIGKILL.
Sep 20 19:47:53 drystone systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL
Sep 20 19:47:53 drystone systemd[1]: [email protected]: Failed with result 'timeout'.
Sep 20 19:47:53 drystone systemd[1]: Stopped User Manager for UID 1005.

相关内容