从集群中删除控制节点将终止 apiserver

从集群中删除控制节点将终止 apiserver

当我拥有一个具有多个控制节点的 kubernetes 集群并删除其中一个控制节点时,整个 API 服务器似乎不再可用。

在此设置中,我想从两个控制节点缩小到一个控制节点,但最终导致集群无法使用:

$ kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master1   Ready    master   5d20h   v1.18.6
worker1   Ready    <none>   5d19h   v1.18.6
master2   Ready    master   19h     v1.18.6
$ kubectl drain master2 --ignore-daemonsets
node/master2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hns7p, kube-system/kube-proxy-vk6t7
node/master2 drained
$ kubectl get nodes
NAME      STATUS                     ROLES    AGE     VERSION
master1   Ready                      master   5d20h   v1.18.6
worker1   Ready                      <none>   5d20h   v1.18.6
master2   Ready,SchedulingDisabled   master   19h     v1.18.6
$ kubectl delete node master2
node "master2" deleted
$ kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master1   Ready    master   5d20h   v1.18.6
worker1   Ready    <none>   5d20h   v1.18.6
$ ssh master2
$ sudo kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0811 10:24:49.750898    7159 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "master2" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0811 10:24:51.487912    7159 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
$ exit
$ kubectl get nodes
Error from server: etcdserver: request timed out
$ kubectl cluster-info

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server master1:6443 was refused - did you specify the right host or port?

这里缺少什么?或者删除控制平面节点与删除工作节点有何不同?欢迎指点。

答案1

您有两个主节点,这也意味着您有两个 etcd 副本。

etcd 文档你可以阅读:

建议集群中的成员数量为奇数。奇数大小的集群容忍的故障数量与偶数大小的集群相同,但节点较少。通过比较偶数和奇数大小的集群可以看出差异:

Cluster Size    Majority    Failure Tolerance
1               1           0
2               2           0
3               2           1

因此,如您所见,拥有 2 个规模的 etcd 集群需要所有副本正常工作,并且不能容忍任何故障。这就是为什么强烈建议使用奇数个 etcd 副本。

所以我相信现在您了解了集群为何崩溃了。

另请查看 kubernetes 文档kubeadm:高可用性拓扑

相关内容