Proxmox:节点负载为 70,2 个容器没有响应,但顶部没有显示任何内容

Proxmox:节点负载为 70,2 个容器没有响应,但顶部没有显示任何内容

我遇到了一个大问题。如果我不重启节点,我的一些基于 Proxmox 的 LXC 容器 2 天以来都没有响应。

这种情况总是在晚上的同一时间发生(我猜想集装箱上发生了某些事情导致负载过重)。

问题是:top//atop没有htop显示任何内容。proxmox-node 对 ssh 命令的反应没有问题,但 5 个节点中有 2 个实际上没有响应(我可以使用 SSH 登录,但无法输入命令)。

我还必须进行“硬”重启,因为重启不起作用(LXC 容器在 40 分钟后没有停止)。

这是我的 PVE 版本:

pveversion -v
proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-15 (running version: 4.1-15/8cd55b52)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-2.6.32-43-pve: 2.6.32-166
pve-kernel-4.2.8-1-pve: 4.2.8-39
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-33
qemu-server: 4.0-62
pve-firmware: 1.1-7
libpve-common-perl: 4.0-49
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-42
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-46
pve-firewall: 2.0-18
pve-ha-manager: 1.0-24
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1

不幸的是日志没有显示任何内容。

系统日志:

Mar 15 04:32:31 server pvedaemon[4061]: worker exit
Mar 15 04:32:31 server pvedaemon[1192]: worker 4061 finished
Mar 15 04:32:31 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:32:31 server pvedaemon[1192]: worker 24675 started
Mar 15 04:33:05 server pvedaemon[6601]: worker exit
Mar 15 04:33:05 server pvedaemon[1192]: worker 6601 finished
Mar 15 04:33:05 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:33:05 server pvedaemon[1192]: worker 25112 started
Mar 15 04:34:57 server systemd-timesyncd[559]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.021s/0.001s/+1ppm
Mar 15 04:36:08 server pveproxy[17238]: worker exit
Mar 15 04:36:08 server pveproxy[1212]: worker 17238 finished
Mar 15 04:36:08 server pveproxy[1212]: starting 1 worker(s)
Mar 15 04:36:08 server pveproxy[1212]: worker 28231 started
Mar 15 04:39:48 server pvedaemon[572]: worker exit
Mar 15 04:39:48 server pvedaemon[1192]: worker 572 finished
Mar 15 04:39:48 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:39:48 server pvedaemon[1192]: worker 31498 started
Mar 15 04:40:40 server pveproxy[31690]: worker exit
Mar 15 04:40:40 server pveproxy[1212]: worker 31690 finished
Mar 15 04:40:40 server pveproxy[1212]: starting 1 worker(s)
Mar 15 04:40:40 server pveproxy[1212]: worker 32442 started
Mar 15 04:45:02 server pvedaemon[25112]: <root@pam> successful auth for user 'root@pam'
Mar 15 04:46:27 server pveproxy[28231]: worker exit
Mar 15 04:46:27 server pveproxy[1212]: worker 28231 finished
Mar 15 04:46:27 server pveproxy[1212]: starting 1 worker(s)
Mar 15 04:46:27 server pveproxy[1212]: worker 5082 started
Mar 15 04:48:45 server pveproxy[17122]: worker exit
Mar 15 04:48:45 server pveproxy[1212]: worker 17122 finished
Mar 15 04:48:45 server pveproxy[1212]: starting 1 worker(s)
Mar 15 04:48:45 server pveproxy[1212]: worker 6924 started
Mar 15 04:51:28 server pvedaemon[25112]: worker exit
Mar 15 04:51:28 server pvedaemon[1192]: worker 25112 finished
Mar 15 04:51:28 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:51:28 server pvedaemon[1192]: worker 9770 started
Mar 15 04:51:38 server pveproxy[32442]: worker exit
Mar 15 04:51:38 server pveproxy[1212]: worker 32442 finished
Mar 15 04:51:38 server pveproxy[1212]: starting 1 worker(s)
Mar 15 04:51:38 server pveproxy[1212]: worker 9911 started
Mar 15 04:52:45 server pvedaemon[31498]: worker exit
Mar 15 04:52:45 server pvedaemon[1192]: worker 31498 finished
Mar 15 04:52:45 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:52:45 server pvedaemon[1192]: worker 10794 started
Mar 15 04:55:46 server pvedaemon[24675]: worker exit
Mar 15 04:55:46 server pvedaemon[1192]: worker 24675 finished
Mar 15 04:55:46 server pvedaemon[1192]: starting 1 worker(s)
Mar 15 04:55:46 server pvedaemon[1192]: worker 13187 started
Mar 15 04:57:32 server rrdcached[972]: flushing old values
Mar 15 04:57:32 server rrdcached[972]: rotating journals
Mar 15 04:57:32 server rrdcached[972]: started new journal /var/lib/rrdcached/journal/rrd.journal.1458014252.151024
Mar 15 04:57:32 server rrdcached[972]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1458007052.150971
Mar 15 04:57:40 server puppet-agent[14639]: Finished catalog run in 0.53 seconds

答案1

lxcfs:2.0.0-pve1 有一个错误,导致容器挂在内核中。

我已通过更新至 lxcfs: 2.0.0-pve2 解决了该问题。请查看此处:

https://forum.proxmox.com/threads/proxmox-4-0-lxc-containers-network-unstable.26353/

答案2

我们运行与您相同的内核,并且 LXC 容器也完全挂起。同一主机上的 KVM 机器仍然处于运行状态。这可能是什么原因,如何让 LXC 容器在不重启主机的情况下再次响应?

即使在主机上运行以下命令它也不会继续:

pct 输入 ID

相关内容