搭建K8S HA集群时无法启动etcd集群

2024-6-2 • tag-icon

我正在按照这篇文章建立一个 HA k8s 集群：指南：使用 kubeadm 的 Kubernetes Multi-Master HA 集群

我有三个主节点（3、4、5）和四个工作节点（2、6、7、8）（其中一个工作节点是 HAProxy 负载均衡器）

在“在所有 3 个主节点上安装和配置 Etcd”部分的步骤 6 中，我在主节点 3 中收到以下错误：

{"level":"warn","ts":"2023-03-28T17:21:07.929-0700","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-beafcd7b-fbf5-4c3e-b9ce-5c1032e26041/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

但是，在其他主节点（4 和 5）中，我得到以下信息：

162974ed2b5b12b2, started, 192.168.60.4, https://192.168.60.4:2380, https://192.168.60.4:2379, false
5642d9d9da8c08a3, started, 192.168.60.3, https://192.168.60.3:2380, https://192.168.60.3:2379, false
6ffc3bfbd773170f, started, 192.168.60.5, https://192.168.60.5:2380, https://192.168.60.5:2379, false

这是节点 3 的 etcd.service：

[Unit]
Description=etcd
Documentation=https://github.com/coreos


[Service]
ExecStart=/usr/local/bin/etcd \
  --name 192.168.60.3 \
  --cert-file=/etc/etcd/kubernetes.pem \
  --key-file=/etc/etcd/kubernetes-key.pem \
  --peer-cert-file=/etc/etcd/kubernetes.pem \
  --peer-key-file=/etc/etcd/kubernetes-key.pem \
  --trusted-ca-file=/etc/etcd/ca.pem \
  --peer-trusted-ca-file=/etc/etcd/ca.pem \
  --peer-client-cert-auth \
  --client-cert-auth \
  --initial-advertise-peer-urls https://192.168.60.3:2380 \
  --listen-peer-urls https://192.168.60.3:2380 \
  --listen-client-urls https://192.168.60.3:2379,http://127.0.0.1:2379 \
  --advertise-client-urls https://192.168.60.3:2379 \
  --initial-cluster-token etcd-cluster-0 \
  --initial-cluster 192.168.60.3=https://192.168.60.3:2380,192.168.60.4=https://192.168.60.4:2380,192.168.60.5=https://192.168.60.5:2380 \
  --initial-cluster-state new \
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5



[Install]
WantedBy=multi-user.target

节点 4 的 etcd.service：

[Unit]
Description=etcd
Documentation=https://github.com/coreos


[Service]
ExecStart=/usr/local/bin/etcd \
  --name 192.168.60.4 \
  --cert-file=/etc/etcd/kubernetes.pem \
  --key-file=/etc/etcd/kubernetes-key.pem \
  --peer-cert-file=/etc/etcd/kubernetes.pem \
  --peer-key-file=/etc/etcd/kubernetes-key.pem \
  --trusted-ca-file=/etc/etcd/ca.pem \
  --peer-trusted-ca-file=/etc/etcd/ca.pem \
  --peer-client-cert-auth \
  --client-cert-auth \
  --initial-advertise-peer-urls https://192.168.60.4:2380 \
  --listen-peer-urls https://192.168.60.4:2380 \
  --listen-client-urls https://192.168.60.4:2379,http://127.0.0.1:2379 \
  --advertise-client-urls https://192.168.60.4:2379 \
  --initial-cluster-token etcd-cluster-0 \
  --initial-cluster 192.168.60.3=https://192.168.60.3:2380,192.168.60.4=https://192.168.60.4:2380,192.168.60.5=https://192.168.60.5:2380 \
  --initial-cluster-state new \
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5



[Install]
WantedBy=multi-user.target

节点 5 的 etcd.service：

[Unit]
Description=etcd
Documentation=https://github.com/coreos


[Service]
ExecStart=/usr/local/bin/etcd \
  --name 192.168.60.5 \
  --cert-file=/etc/etcd/kubernetes.pem \
  --key-file=/etc/etcd/kubernetes-key.pem \
  --peer-cert-file=/etc/etcd/kubernetes.pem \
  --peer-key-file=/etc/etcd/kubernetes-key.pem \
  --trusted-ca-file=/etc/etcd/ca.pem \
  --peer-trusted-ca-file=/etc/etcd/ca.pem \
  --peer-client-cert-auth \
  --client-cert-auth \
  --initial-advertise-peer-urls https://192.168.60.5:2380 \
  --listen-peer-urls https://192.168.60.5:2380 \
  --listen-client-urls https://192.168.60.5:2379,http://127.0.0.1:2379 \
  --advertise-client-urls https://192.168.60.5:2379 \
  --initial-cluster-token etcd-cluster-0 \
  --initial-cluster 192.168.60.3=https://192.168.60.3:2380,192.168.60.4=https://192.168.60.4:2380,192.168.60.5=https://192.168.60.5:2380 \
  --initial-cluster-state new \
  --data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5



[Install]
WantedBy=multi-user.target

如果我使用：

 ETCDCTL_API=2 etcdctl member list

我可以得到：

client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://127.0.0.1:2379 exceeded header timeout
; error #1: dial tcp 127.0.0.1:4001: connect: connection refused

相关内容