无法启动用户管理器,未找到 CGroup 介质

无法启动用户管理器,未找到 CGroup 介质

我遇到了一个奇怪的问题,据我所知,根本原因与 CGroups 有关或直接相关。不幸的是,我对 SystemD 或 CGroups 了解不够,不知道如何准确排除故障和诊断根本原因。我知道的是,几乎任何用户 SystemD 服务(即systemctl --user start xyz)都无法启动,并且错误消息相当一致:

[email protected]: Failed to enable/disable controllers on cgroup /user.slice/user-1000.slice/[email protected], ignoring: Permission denied
[email protected]: Main process exited, code=exited, status=219/CGROUP

这是在 Ubuntu 22.04 上,到目前为止我已经尝试了以下内核版本:

  • v6.5.0-17-通用
  • v6.5.0-15-通用

systemctl --failed

  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION                                                     
● apcupsd.service             loaded failed failed UPS power management daemon
● apply-cgroups.service       loaded failed failed Apply CGroup settings
● dnsmasq.service             loaded failed failed dnsmasq - A lightweight DHCP and caching DNS server
● grub-common.service         loaded failed failed Record successful boot for GRUB
● kerneloops.service          loaded failed failed Tool to automatically collect and submit kernel crash signatures
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● rpc-statd.service           loaded failed failed NFS status monitor for NFSv2/3 locking.[email protected]           loaded failed failed User Manager for UID 1000[email protected]           loaded failed failed User Manager for UID 1004[email protected]            loaded failed failed User Manager for UID 130
● vmware-tools.service        loaded failed failed VMWare Tools Service

apply-cgroups.service是我写的东西,内容如下(我已禁用此服务,并重新启动,但似乎没有任何改变):

[Unit]
Description=Apply CGroup settings

[Service]
Type=oneshot
ExecStart=cgcreate -g cpu,cpuacct:/foo-app
ExecStart=cgset -r cpu.cfs_quota_us=750000 /foo-app
ExecStart=cgset -r memory.limit_in_bytes=8G /foo-app

[Install]
WantedBy=multi-user.target

的内容/sys/fs/cgroup,我的研究告诉我,这表明 CGroups v2 不可用:

-rw-r--r--  1 root root 0 Feb 14 01:17 cgroup.clone_children
-rw-r--r--  1 root root 0 Feb 14 01:17 cgroup.procs
-r--r--r--  1 root root 0 Feb 14 01:17 cgroup.sane_behavior
drwxr-xr-x  2 root root 0 Feb 14 01:20 dev-hugepages.mount
--w-------  1 root root 0 Feb 14 01:17 devices.allow
--w-------  1 root root 0 Feb 14 01:17 devices.deny
-r--r--r--  1 root root 0 Feb 14 01:17 devices.list
drwxr-xr-x  2 root root 0 Feb 14 01:20 dev-mqueue.mount
-rw-r--r--  1 root root 0 Feb 14 01:17 notify_on_release
drwxr-xr-x  2 root root 0 Feb 14 01:20 proc-fs-nfsd.mount
drwxr-xr-x  2 root root 0 Feb 14 01:20 proc-sys-fs-binfmt_misc.mount
-rw-r--r--  1 root root 0 Feb 14 01:17 release_agent
drwxr-xr-x  2 root root 0 Feb 14 01:20 sys-fs-fuse-connections.mount
drwxr-xr-x  2 root root 0 Feb 14 01:20 sys-kernel-config.mount
drwxr-xr-x  2 root root 0 Feb 14 01:20 sys-kernel-debug.mount
drwxr-xr-x  2 root root 0 Feb 14 01:20 sys-kernel-tracing.mount
drwxr-xr-x 99 root root 0 Feb 14 01:45 system.slice
-rw-r--r--  1 root root 0 Feb 14 01:17 tasks
drwxr-xr-x  5 root root 0 Feb 14 01:29 user.slice

cat /proc/filesystems表明cgroupcgroup2均可用:

...
nodev   cgroup
nodev   cgroup2
...

在重新安装软件包作为故障排除步骤的过程中,我定期看到此错误消息,恐怕我甚至不明白这个服务是什么或者它试图实现什么:

Failed to reload daemon: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)

最后,我从一些故障排除建议中发现了一些可能提供线索的东西:

02:32:20 > systemd --user
Cannot determine cgroup we are running in: No medium found
Failed to allocate manager object: No medium found

这是什么意思?它正在寻找什么样的媒介?我的主要问题是,我不太了解诊断这个问题根本原因的工具,所以我一直在徒劳地摸索,这就是为什么我要向 SO 寻求答案。

相关内容