我有一个 LXC 集群,其中有三个节点,都是 KVM 虚拟机,其中 Kubuntu 18.04 和 LXD 3.0.3 通过 apt 安装了
主机也是 Kubuntu 18.04
我在第一个节点(kvmnode1)上安装了 LXD,然后将其他两个节点(kvmnode2、kvmnode3)加入集群
然后我验证了从第一个到最后一个节点启动节点(等待每个节点完成启动)以及从最后一个到第一个节点关闭节点总能提供一个完全可操作的集群
今天我忘了等待每个节点完全启动,结果集群中第一个节点没有启动 LXD,而其他 2 个节点工作正常
在第一个出现故障的节点上,我看到
sysop@kvmnode1:~/Scaricati$ sudo systemctl status lxd
[sudo] password di sysop:
● lxd.service - LXD - main daemon
Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)
Active: activating (start-post) (Result: exit-code) since Mon 2019-04-22 10:36:10 CEST; 1min 47s ago
Docs: man:lxd(1)
Process: 1592 ExecStart=/usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log (code=exited, status=1/FAILURE)
Process: 1560 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)
Main PID: 1592 (code=exited, status=1/FAILURE); Control PID: 1593 (lxd)
Tasks: 8
CGroup: /system.slice/lxd.service
└─1593 /usr/lib/lxd/lxd waitready --timeout=600
apr 22 10:36:10 kvmnode1 systemd[1]: Starting LXD - main daemon...
apr 22 10:36:10 kvmnode1 lxd[1592]: t=2019-04-22T10:36:10+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
apr 22 10:36:19 kvmnode1 lxd[1592]: t=2019-04-22T10:36:19+0200 lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: upda
apr 22 10:36:19 kvmnode1 lxd[1592]: Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
apr 22 10:36:19 kvmnode1 systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE
sysop@kvmnode1:~/Scaricati$
在第二个节点上我看到
sysop@kvmnode2:~$ lxc cluster list
+----------+-----------------------------+----------+---------+----------------------------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES | OFFLINE | no heartbeat since 134h36m2.926365228s |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+---------+----------------------------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+---------+----------------------------------------+
sysop@kvmnode2:~$
我该怎么做才能让 kvmnode1 正确启动并加入集群?
PS 我尝试按每个顺序启动/停止这三个节点,但总是以相同的状态结束
更新
我尝试在调试模式下启动 lxd,但报告了同样的失败
sysop@kvmnode1:~$ sudo lxd --debug --group lxd
DBUG[04-22|12:19:42] Connecting to a local LXD over a Unix socket
DBUG[04-22|12:19:42] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
INFO[04-22|12:19:42] LXD 3.0.3 is starting in normal mode path=/var/lib/lxd
INFO[04-22|12:19:42] Kernel uid/gid map:
INFO[04-22|12:19:42] - u 0 0 4294967295
INFO[04-22|12:19:42] - g 0 0 4294967295
INFO[04-22|12:19:42] Configured LXD uid/gid map:
INFO[04-22|12:19:42] - u 0 165536 65536
INFO[04-22|12:19:42] - g 0 165536 65536
WARN[04-22|12:19:42] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[04-22|12:19:42] Kernel features:
INFO[04-22|12:19:42] - netnsid-based network retrieval: no
INFO[04-22|12:19:42] - unprivileged file capabilities: yes
INFO[04-22|12:19:42] Initializing local database
DBUG[04-22|12:19:42] Initializing database gateway
DBUG[04-22|12:19:42] Connecting to a local LXD over a Unix socket
DBUG[04-22|12:19:42] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[04-22|12:19:42] Detected stale unix socket, deleting
INFO[04-22|12:19:42] Starting /dev/lxd handler:
INFO[04-22|12:19:42] - binding devlxd socket socket=/var/lib/lxd/devlxd/sock
INFO[04-22|12:19:42] REST API daemon:
INFO[04-22|12:19:42] - binding Unix socket socket=/var/lib/lxd/unix.socket
INFO[04-22|12:19:42] - binding TCP socket socket=[::]:8443
INFO[04-22|12:19:42] Initializing global database
DBUG[04-22|12:19:42] Found cert k=0
DBUG[04-22|12:19:42] Dqlite: server connection failed err=failed to establish network connection: some nodes are behind this node's version address=192.168.201.11:8443 attempt=0
DBUG[04-22|12:19:42] Dqlite: connected address=192.168.201.12:8443 attempt=0
DBUG[04-22|12:19:42] Database error: failed to update node version info: updated 0 rows instead of 1
EROR[04-22|12:19:42] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
INFO[04-22|12:19:42] Starting shutdown sequence
INFO[04-22|12:19:42] Stopping REST API handler:
INFO[04-22|12:19:42] - closing socket socket=[::]:8443
INFO[04-22|12:19:42] - closing socket socket=/var/lib/lxd/unix.socket
INFO[04-22|12:19:42] Stopping /dev/lxd handler
INFO[04-22|12:19:42] - closing socket socket=/var/lib/lxd/devlxd/sock
DBUG[04-22|12:19:42] Stop database gateway
INFO[04-22|12:19:42] Stopping REST API handler:
INFO[04-22|12:19:42] Stopping /dev/lxd handler
INFO[04-22|12:19:42] Stopping REST API handler:
INFO[04-22|12:19:42] Stopping /dev/lxd handler
DBUG[04-22|12:19:42] Not unmounting temporary filesystems (containers are still running)
INFO[04-22|12:19:42] Saving simplestreams cache
INFO[04-22|12:19:42] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
sysop@kvmnode1:~$
有什么提示吗?