我已经使用命令创建了我的 Kubernetes 集群
sudo kubeadm init --pod-network-cidr 192.168.0.0/16
我在控制平面节点上安装了 Calico 网络插件
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
我有两个工作节点服务器,我正在尝试使用命令将它们加入到我的集群
sudo kubeadm join IP_OF_MY_SERVER:6443 --token ... --discovery-token-ca-cert-hash sha256:...
但它永远挂起,什么也没发生。这发生在两个工作节点服务器上。我的工作节点具有完全连接性,我可以访问互联网,并且可以通过 IP 和主机名访问我的控制平面节点。我的集群处于活动状态。
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-control Ready control-plane 5h12m v1.26.1
kubectl cluster-info
Kubernetes control plane is running at https://172.31.97.251:6443
CoreDNS is running at https://172.31.97.251:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
我在 Ubuntu 22.04.2 LTS Jammy、containerd 版本 1.6 上运行集群和工作节点
containerd --version
containerd containerd.io 1.6.18 2456e983eb9e37e47538f59ea18f2043c9a73640
kubelet 版本 1.26.1
kubelet --version
Kubernetes v1.26.1
kubectl 版本 1.26
Client Version: version.Info Major:"1", Minor:"26"
我的 containerd 已启动并运行systemd
sudo systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-02-27 21:40:01 UTC; 52min ago
我错过了什么?
更新 我检查了工作节点上的系统日志,这就是我看到的内容
sudo cat /var/log/syslog
Feb 27 22:19:17 k8s-worker-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 22:19:17 k8s-worker-1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 27 22:19:27 k8s-worker-1 systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 187.
Feb 27 22:19:27 k8s-worker-1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 27 22:19:27 k8s-worker-1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 27 22:19:27 k8s-worker-1 kubelet[9361]: E0227 22:19:27.903339 9361 run.go:74] "command failed" err="failed to validate kubelet flags: the container runtime endpoint address was not specified or empty, use --container-runtime-endpoint to set"
Feb 27 22:19:27 k8s-worker-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 22:19:27 k8s-worker-1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
答案1
看起来你需要明确告诉 kubelet 在哪里找到 containerd 套接字。在/etc/systemd/system/kubelet.service,将此行作为参数添加到 kubelet 可执行文件中(通常ExecStart=/usr/local/bin/kubelet
):
--container-runtime-endpoint unix:///run/containerd/containerd.sock
验证 containerd.sock 的位置。如果它不在 /run/containerd.containerd.sock 中,您可以通过查看/etc/containerd/config.toml归档于
[grpc]
address = "/run/containerd/containerd.sock"
如果 kubelet.service 文件不在该位置,可以通过运行systemctl status kubelet
并查看已加载:线。
最后,systemctl restart kubelet
。