Ubuntu 18.04 系统安装一周后因 dbus 错误而死机

Ubuntu 18.04 系统安装一周后因 dbus 错误而死机

我已经在谷歌上苦苦思索了一天,但仍然没有找到解决方案,所以我希望有人可以提供帮助。

我正在通过 fai 安装 18.04,然后使用 puppet 进行一些系统配置。所有系统运行了大约一周,然后用户就无法再登录他们的系统了。上周我在四小时内安装了五个系统,今天,在同样的四小时内,所有系统都开始出现同样的问题。

用户无法从 gdm 登录到该框。ssh 可以进入该框,但需要等待 30 秒左右。

我在日志中看到以下内容:

==> /var/log/syslog <==
Nov 16 15:44:15 pre043 systemd-logind[1921]: do_ypcall: clnt_call: RPC: Unable to send; errno = Operation not permitted

==> /var/log/auth.log <==
Nov 16 15:44:40 pre043 sshd[1971]: pam_systemd(sshd:session): Failed to create session: Connection timed out

这是正在发生的事情:

$ systemctl list-unit-files --user
Failed to connect to bus: No such file or directory

pam_systemd 没有创建 /run/user/USERID 目录,并且没有设置通常设置的 XDG_RUNTIME_DIR。

如果我等一会儿,我就会在日志中看到可能相关的内容:

==> /var/log/apport.log <==
ERROR: apport (pid 2071) Fri Nov 16 15:46:58 2018: called for pid 1921, signal 6, core limit 0, dump mode 1
ERROR: apport (pid 2071) Fri Nov 16 15:46:58 2018: executable: /lib/systemd/systemd-logind (command line "/lib/systemd/systemd-logind")
ERROR: apport (pid 2071) Fri Nov 16 15:46:58 2018: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 2071) Fri Nov 16 15:46:58 2018: apport: report /var/crash/_lib_systemd_systemd-logind.0.crash already exists and unseen, doing nothing to avoid disk usage DoS

==> /var/log/auth.log <==
Nov 16 15:46:58 pre043 systemd-logind[2072]: New seat seat0.
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event2 (Power Button)
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event1 (Power Button)
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event0 (Sleep Button)
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event7 (Dell Dell USB Entry Keyboard)
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event4 (Compx 2.4G Receiver)
Nov 16 15:46:58 pre043 systemd-logind[2072]: Watching system buttons on /dev/input/event6 (Compx 2.4G Receiver)

==> /var/log/syslog <==
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Killing process 1921 (systemd-logind) with signal SIGABRT.
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Main process exited, code=dumped, status=6/ABRT
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Nov 16 15:46:58 pre043 systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 2.
Nov 16 15:46:58 pre043 systemd[1]: Stopped Login Service.
Nov 16 15:46:58 pre043 systemd[1]: Starting Login Service...
Nov 16 15:46:58 pre043 systemd[1]: Started Login Service.

==> /var/log/auth.log <==
Nov 16 15:46:58 pre043 systemd-logind[2072]: New session c1 of user gdm.
Nov 16 15:46:58 pre043 systemd-logind[2072]: New session 3 of user root.

==> /var/log/syslog <==
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event2: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event3: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event1: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event0: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event18: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event19: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event20: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken
Nov 16 15:46:59 pre043 gnome-shell[1429]: Could not open device /dev/input/event21: GDBus.Error:org.freedesktop.login1.DeviceIsTaken: Device already taken

dbus 似乎出了点问题,但我不知道是什么原因。重启也无法解决问题。重新安装 dbus 也无法解决问题。

值得注意的是,损坏的盒子上的 dbus 位的参数与正常工作的盒子上的明显不同。

(破损的盒子)

# ps -ef | grep -i dbus | grep mparker
mparker   1916     1  0 15:29 pts/0    00:00:00 dbus-launch --autolaunch 629f8bd4627543a0a62559707dac566f --binary-syntax --close-stderr
mparker   1917     1  0 15:29 ?        00:00:00 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session

(工作箱)

# ps -ef | grep -i dbus | grep mparker
mparker   1893  1860  0 17:08 ?        00:00:00 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
mparker   1994  1989  0 17:08 ?        00:00:00 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3

可能还需要注意的是,在损坏的机器重新启动后,我看到 dpkg.log 文件中列出了许多半配置或半安装的包。

grep half- /var/log/dpkg.log | grep '15:22'
2018-11-16 15:22:35 status half-configured dbus:amd64 1.12.2-1ubuntu1
2018-11-16 15:22:35 status half-installed dbus:amd64 1.12.2-1ubuntu1
2018-11-16 15:22:35 status half-installed dbus:amd64 1.12.2-1ubuntu1
2018-11-16 15:22:35 status half-configured ureadahead:amd64 0.100.0-20
2018-11-16 15:22:36 status half-configured systemd:amd64 237-3ubuntu10.6
2018-11-16 15:22:36 status half-configured man-db:amd64 2.8.3-2ubuntu0.1
2018-11-16 15:22:36 status half-configured dbus:amd64 1.12.2-1ubuntu1

我已经通过以下方式重新安装了它们

# apt-get install --reinstall $(grep half- /var/log/dpkg.log | grep '15:22' | awk '{print $5}' | sort -u | sed 's/:amd64//')
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 4 reinstalled, 0 to remove and 9 not upgraded.
Need to get 3932 kB/4082 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://mirror/ubuntu bionic-updates/main amd64 systemd amd64 237-3ubuntu10.6 [2894 kB]
Get:2 http://mirror/ubuntu bionic-updates/main amd64 man-db amd64 2.8.3-2ubuntu0.1 [1019 kB]
Get:3 http://mirror/ubuntu bionic/main amd64 ureadahead amd64 0.100.0-20 [19.3 kB]
Fetched 3932 kB in 0s (52.5 MB/s)    
Preconfiguring packages ...
(Reading database ... 242468 files and directories currently installed.)
Preparing to unpack .../systemd_237-3ubuntu10.6_amd64.deb ...
Unpacking systemd (237-3ubuntu10.6) over (237-3ubuntu10.6) ...
Preparing to unpack .../man-db_2.8.3-2ubuntu0.1_amd64.deb ...
Unpacking man-db (2.8.3-2ubuntu0.1) over (2.8.3-2ubuntu0.1) ...
Preparing to unpack .../dbus_1.12.2-1ubuntu1_amd64.deb ...
Unpacking dbus (1.12.2-1ubuntu1) over (1.12.2-1ubuntu1) ...
Preparing to unpack .../ureadahead_0.100.0-20_amd64.deb ...
Unpacking ureadahead (0.100.0-20) over (0.100.0-20) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Setting up ureadahead (0.100.0-20) ...
Setting up systemd (237-3ubuntu10.6) ...
Setting up man-db (2.8.3-2ubuntu0.1) ...
Updating database of manual pages ...
Setting up dbus (1.12.2-1ubuntu1) ...
A reboot is required to replace the running dbus-daemon.
Please reboot the system when convenient.

虽然 dpkg -s 没有列出任何问题,但这些包仍然处于半状态。

# dpkg -s dbus
Package: dbus
Status: install ok installed
Priority: standard
Section: admin
Installed-Size: 559
Maintainer: Ubuntu Developers <[email protected]>
Architecture: amd64
Multi-Arch: foreign
Version: 1.12.2-1ubuntu1
Depends: adduser, lsb-base, libapparmor1 (>= 2.8.94-0ubuntu1), libaudit1 (>= 1:2.2.1), libc6 (>= 2.14), libcap-ng0, libdbus-1-3 (= 1.12.2-1ubuntu1), libexpat1 (>= 2.1~beta3), libselinux1 (>= 2.0.65), libsystemd0
Suggests: default-dbus-session-bus | dbus-session-bus
Conffiles:
 /etc/default/dbus 0d0f25a2f993509c857eb262f6e22015
 /etc/init.d/dbus ec9a7d183ec50837a12aca3f9c95cc27

我完全没有主意了。任何帮助我都非常感谢!

答案1

我在 4.15.0 系列中的任何内核上都发现了完全相同的问题。启动到 4.4.0 内核可以修复该问题,但我不知道是什么不同导致了该问题。

相关内容