添加第二个 master 后 master 发生故障

2024-6-1 • tag-icon

在 Virtualbox 下运行。5 台机器：2 个工作节点，但我甚至还没有到达那一步。1x 负载均衡器，ubuntu 运行 haproxy，在 192.168.20.10 上，配置如下：

frontend kubernetes-frontend
        bind 0.0.0.0:6443
        mode tcp
        option tcplog
        default_backend kubernetes-backend

    backend kubernetes-backend
        mode tcp
        option tcplog
        option tcp-check
        balance roundrobin
        default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 weight 100
        server kubernetes-master-1 192.168.20.21:6443 check
        server kubernetes-master-2 192.168.20.22:6443 check

2x 主节点，完整副本。kubeadm v 1.19.4，docker 19.03。crio 1.17，Kubernetes v1.19.4。

kubernetes-master-1 192.168.20.21

kubernetes-master-2 192.168.20.22

运行 init 命令

sudo kubeadm init --control-plane-endpoint="192.168.20.10:6443" --upload-certs --apiserver-advertise-address=192.168.20.21 --pod-network-cidr=10.100.0.0/16

成功

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 192.168.20.10:6443 --token c2p4af.9s3aapujrfjkjlho \
    --discovery-token-ca-cert-hash sha256:ff3fc8d5e1a7ee16e2d48362cef4e3fa53df4c8fd672e69c8fe2c9e5826ab0c9 \
    --control-plane --certificate-key 57d92a387afbd601fba5da9e310523fa5ac8dfcdf0fd70dd8624a9950ce06457

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.20.10:6443 --token c2p4af.9s3aapujrfjkjlho \
    --discovery-token-ca-cert-hash sha256:ff3fc8d5e1a7ee16e2d48362cef4e3fa53df4c8fd672e69c8fe2c9e5826ab0c9

（全输出这里）

到目前为止，一切都很好，但是当我在 master2 上运行 join 命令时，

[etcd] Creating static Pod manifest for "etcd"

（全输出这里）它再输出一行， [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s 然后

[kubelet-check] Initial timeout of 40s passed.

就这样。master-1（之前响应过）正在响应

kubectl cluster-info

像这样：

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Error from server: etcdserver: request timed out

建议的命令返回以下输出：

kubectl cluster-info dump
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)

就是这样。无论我之前是否安装过网络（我使用的是 calico），我都会得到相同的结果。相同的图像适用于单主服务器，我可以添加节点并运行命令。但是，无论我遵循哪个指南，这总是会失败。我已经检查了 etcd（在 master 1 上），它在 master-2 上执行连接之前正在运行（或正在运行）。它还在监听正确的地址（192.168.20.21），而不是 localhost。

任何帮助都非常感谢！谢谢！

答案1

好的！所以这一切都通过向第二个主服务器添加 --apiserver-advertise-address=192.168.20.22 来解决。天哪。所以当你在辅助服务器上使用 join 命令时，请确保添加

--apiserver-advertise-address=

以及该服务器的地址，不是第一个主人，但是这掌握。

答案1

相关内容