LXC 集群：重启后无法启动第一个节点

2024-6-10 • tag-icon

我有一个 LXC 集群，其中有三个节点，都是 KVM 虚拟机，其中 Kubuntu 18.04 和 LXD 3.0.3 通过 apt 安装了

主机也是 Kubuntu 18.04

我在第一个节点（kvmnode1）上安装了 LXD，然后将其他两个节点（kvmnode2、kvmnode3）加入集群

然后我验证了从第一个到最后一个节点启动节点（等待每个节点完成启动）以及从最后一个到第一个节点关闭节点总能提供一个完全可操作的集群

今天我忘了等待每个节点完全启动，结果集群中第一个节点没有启动 LXD，而其他 2 个节点工作正常

在第一个出现故障的节点上，我看到

sysop@kvmnode1:~/Scaricati$ sudo systemctl status lxd
[sudo] password di sysop: 
● lxd.service - LXD - main daemon
   Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)
   Active: activating (start-post) (Result: exit-code) since Mon 2019-04-22 10:36:10 CEST; 1min 47s ago
     Docs: man:lxd(1)
  Process: 1592 ExecStart=/usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log (code=exited, status=1/FAILURE)
  Process: 1560 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)
 Main PID: 1592 (code=exited, status=1/FAILURE); Control PID: 1593 (lxd)
    Tasks: 8
   CGroup: /system.slice/lxd.service
           └─1593 /usr/lib/lxd/lxd waitready --timeout=600

apr 22 10:36:10 kvmnode1 systemd[1]: Starting LXD - main daemon...
apr 22 10:36:10 kvmnode1 lxd[1592]: t=2019-04-22T10:36:10+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
apr 22 10:36:19 kvmnode1 lxd[1592]: t=2019-04-22T10:36:19+0200 lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: upda
apr 22 10:36:19 kvmnode1 lxd[1592]: Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
apr 22 10:36:19 kvmnode1 systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE
sysop@kvmnode1:~/Scaricati$

在第二个节点上我看到

sysop@kvmnode2:~$ lxc cluster list
+----------+-----------------------------+----------+---------+----------------------------------------+
|   NAME   |             URL             | DATABASE |  STATE  |                MESSAGE                 |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES      | OFFLINE | no heartbeat since 134h36m2.926365228s |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES      | ONLINE  | fully operational                      |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES      | ONLINE  | fully operational                      |
+----------+-----------------------------+----------+---------+----------------------------------------+
sysop@kvmnode2:~$

我该怎么做才能让 kvmnode1 正确启动并加入集群？

PS 我尝试按每个顺序启动/停止这三个节点，但总是以相同的状态结束

更新

我尝试在调试模式下启动 lxd，但报告了同样的失败

sysop@kvmnode1:~$ sudo lxd --debug --group lxd
DBUG[04-22|12:19:42] Connecting to a local LXD over a Unix socket 
DBUG[04-22|12:19:42] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[04-22|12:19:42] LXD 3.0.3 is starting in normal mode     path=/var/lib/lxd
INFO[04-22|12:19:42] Kernel uid/gid map: 
INFO[04-22|12:19:42]  - u 0 0 4294967295 
INFO[04-22|12:19:42]  - g 0 0 4294967295 
INFO[04-22|12:19:42] Configured LXD uid/gid map: 
INFO[04-22|12:19:42]  - u 0 165536 65536 
INFO[04-22|12:19:42]  - g 0 165536 65536 
WARN[04-22|12:19:42] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[04-22|12:19:42] Kernel features: 
INFO[04-22|12:19:42]  - netnsid-based network retrieval: no 
INFO[04-22|12:19:42]  - unprivileged file capabilities: yes 
INFO[04-22|12:19:42] Initializing local database 
DBUG[04-22|12:19:42] Initializing database gateway 
DBUG[04-22|12:19:42] Connecting to a local LXD over a Unix socket 
DBUG[04-22|12:19:42] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[04-22|12:19:42] Detected stale unix socket, deleting 
INFO[04-22|12:19:42] Starting /dev/lxd handler: 
INFO[04-22|12:19:42]  - binding devlxd socket                 socket=/var/lib/lxd/devlxd/sock
INFO[04-22|12:19:42] REST API daemon: 
INFO[04-22|12:19:42]  - binding Unix socket                   socket=/var/lib/lxd/unix.socket
INFO[04-22|12:19:42]  - binding TCP socket                    socket=[::]:8443
INFO[04-22|12:19:42] Initializing global database 
DBUG[04-22|12:19:42] Found cert                               k=0
DBUG[04-22|12:19:42] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=192.168.201.11:8443 attempt=0 
DBUG[04-22|12:19:42] Dqlite: connected address=192.168.201.12:8443 attempt=0 
DBUG[04-22|12:19:42] Database error: failed to update node version info: updated 0 rows instead of 1 
EROR[04-22|12:19:42] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1 
INFO[04-22|12:19:42] Starting shutdown sequence 
INFO[04-22|12:19:42] Stopping REST API handler: 
INFO[04-22|12:19:42]  - closing socket                        socket=[::]:8443
INFO[04-22|12:19:42]  - closing socket                        socket=/var/lib/lxd/unix.socket
INFO[04-22|12:19:42] Stopping /dev/lxd handler 
INFO[04-22|12:19:42]  - closing socket                        socket=/var/lib/lxd/devlxd/sock
DBUG[04-22|12:19:42] Stop database gateway 
INFO[04-22|12:19:42] Stopping REST API handler: 
INFO[04-22|12:19:42] Stopping /dev/lxd handler 
INFO[04-22|12:19:42] Stopping REST API handler: 
INFO[04-22|12:19:42] Stopping /dev/lxd handler 
DBUG[04-22|12:19:42] Not unmounting temporary filesystems (containers are still running) 
INFO[04-22|12:19:42] Saving simplestreams cache 
INFO[04-22|12:19:42] Saved simplestreams cache 
Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
sysop@kvmnode1:~$

有什么提示吗？

相关内容