添加第二个控制平面后，主服务器上的 API 服务器停止运行

2024-6-1 • tag-icon

在我当前的测试设置中，我有几个运行 Debian-11 的虚拟机。所有节点都有一个私有 IP 和第二个 wireguard 接口。将来，节点将位于具有不同网络的不同位置，Wireguard 用于“覆盖”所有不同的网络环境。我想在所有节点上安装 Kubernetes。

node   public ip        wireguard ip
vm1    192.168.10.10    10.11.12.10
vm2    192.168.10.11    10.11.12.11
vm3    192.168.10.12    10.11.12.12
...

因此，我在所有节点上安装了 1.23.5 版的 docker 和 kubeadm/kubelet/kubectl。此外，我还在所有节点上安装了 haproxy。它充当负载均衡器，通过列出 localhost:443 并将请求转发到其中一个在线控制平面。

然后我用 kubeadm 启动集群

vm01> kubeadm init --apiserver-advertise-address=10.11.12.10 --pod-network-cidr=10.20.0.0/16

之后，我测试了集成 flannel 或 calico。通过添加--iface=<wireguard-interface>或设置自定义清单...nodeAddressAutodetectionV4.interface: <wireguard-interface>。

当我添加一个普通节点时，一切都很好。添加了节点，创建了 pod，并通过定义的网络接口进行了通信。

当我添加没有 wireguard 接口的控制平面时，我也可以使用以下方式添加不同的控制平面

vm2> kubeadm join 127.0.0.1:443 --token ... --discovery-token-ca-cert-hash sha256:...  --control-plane

当然在此之前，我已经从 vm01 复制到了 vm02 中的几个文件，/etc/kubernetes/pki例如ca.*，，，和。sa.*front-proxy-ca.*apiserver-kubelet-client.*etcd/ca.*

但是当我将 flannel 或 calico 网络与 wireguard 接口一起使用时，join 命令之后会发生一些奇怪的事情。

root@vm02:~# kubeadm join 127.0.0.1:443 --token nwevkx.tzm37tb4qx3wg2jz --discovery-token-ca-cert-hash sha256:9a97a5846ad823647ccb1892971c5f0004043d88f62328d051a31ce8b697ad4a --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local mimas] and IPs [10.96.0.1 192.168.10.11 127.0.0.1]
[certs] Using the existing "apiserver-kubelet-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

超时后，即使在 vm01 上，API 服务器也会停止工作，我无法再运行任何 kubeadm 或 kubectl 命令。6443 上的 HTTPS 服务已停止。但我既不明白为什么在添加第二个 API 服务器时 vm01 上的 API 服务器会停止工作，也找不到原因，输出谈论的是 192.168.... IP，因为集群应该只通过 10.11.12.0/24 wireguard 网络进行通信。

答案1

在发现类似问题后https://stackoverflow.com/questions/64227042/setting-up-a-kubernetes-master-on-a-different-ip我认为，这也是这里的解决方案。当我添加时--apiserver-advertise-address=<this-wireguard-ip>，输出会发生变化（没有 192.168.. IP）并且它会加入。我不明白的是，为什么 VM01 API 服务器停止工作。

无论 join 命令在后台执行什么操作，它都需要在第二个控制平面上创建一个 etcd 服务，并且该服务也必须在与 flannel/calico 网络接口相同的 IP 上运行。如果使用主网络接口，则第二/第三个控制平面上不需要此参数。

答案1

相关内容