编辑。我在 ubuntu server 20.05 上执行了完全相同的步骤,并且运行正常...
我在 ubuntu 服务器 22.04 上创建了新的 kube 集群,但遇到了几个问题。kube 系统中的 Pod 时而启动,时而停止。我检查了日志,但找不到问题。
kubectl 描述 po calico-kube-controllers-7bdbfc669-kdts2 -n kube-system
有时我无法使用 kubectl,我认为这是因为 kube-api pods 已关闭。
rbo@ubuntuserver:~$ kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-7bdbfc669-kdts2 1/1 Running 7 (6m13s ago) 16m
kube-system calico-node-jz5xb 1/1 Running 7 (7m9s ago) 16m
kube-system coredns-787d4945fb-l4bf5 1/1 Running 6 (5m59s ago) 5h26m
kube-system coredns-787d4945fb-nt8lh 1/1 Running 4 (93s ago) 5h26m
kube-system etcd-ubuntuserver 1/1 Running 16 (6m40s ago) 5h26m
kube-system kube-apiserver-ubuntuserver 1/1 Running 15 (4m29s ago) 5h26m
kube-system kube-controller-manager-ubuntuserver 0/1 CrashLoopBackOff 17 (2m21s ago) 5h25m
kube-system kube-proxy-lc5nm 0/1 CrashLoopBackOff 15 (44s ago) 5h26m
kube-system kube-scheduler-ubuntuserver 1/1 Running 17 (5m40s ago) 5h25m
rbo@ubuntuserver:~$ journalctl -n 30
Dec 13 21:17:12 ubuntuserver kubelet[662]: E1213 21:17:12.362890 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-lc5nm_kube-system(0f9da167-6b5b-4530-8ae4-067bcfd88098)\"" pod="kube-system/kube-proxy-lc5nm" podUID=0f9da167-6b5b-4530-8ae4-067bcfd88098
Dec 13 21:17:13 ubuntuserver kubelet[662]: I1213 21:17:13.366892 662 scope.go:115] "RemoveContainer" containerID="4621fe31f41ed1c053e77f495ed215271c4dd12b080405c533dc84e1185680d4"
Dec 13 21:17:13 ubuntuserver containerd[672]: time="2022-12-13T21:17:13.368653147Z" level=info msg="CreateContainer within sandbox \"41195dbd058b802b9812da2ce092a6298580768d9c42401a808f0f5d02342ba5\" for container &ContainerMetadata{Name:kube-scheduler,Attempt:17,}"
Dec 13 21:17:13 ubuntuserver containerd[672]: time="2022-12-13T21:17:13.379867865Z" level=info msg="CreateContainer within sandbox \"41195dbd058b802b9812da2ce092a6298580768d9c42401a808f0f5d02342ba5\" for &ContainerMetadata{Name:kube-scheduler,Attempt:17,} returns container id \"4f86b5e615ba3f0b987024bf64c3063275c3090a4c3b3689172de592908aedc6\""
Dec 13 21:17:13 ubuntuserver containerd[672]: time="2022-12-13T21:17:13.380737267Z" level=info msg="StartContainer for \"4f86b5e615ba3f0b987024bf64c3063275c3090a4c3b3689172de592908aedc6\""
Dec 13 21:17:13 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-4f86b5e615ba3f0b987024bf64c3063275c3090a4c3b3689172de592908aedc6-runc.Jtkdkn.mount: Deactivated successfully.
Dec 13 21:17:13 ubuntuserver containerd[672]: time="2022-12-13T21:17:13.450275779Z" level=info msg="StartContainer for \"4f86b5e615ba3f0b987024bf64c3063275c3090a4c3b3689172de592908aedc6\" returns successfully"
Dec 13 21:17:14 ubuntuserver kubelet[662]: I1213 21:17:14.329726 662 scope.go:115] "RemoveContainer" containerID="d90c8c392ef73770f2161ab12a98cdbdc3e3f7937239f8b10763f760d6091201"
Dec 13 21:17:14 ubuntuserver kubelet[662]: E1213 21:17:14.330141 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ubuntuserver_kube-system(f8f5737540cef58793e773c366765eac)\"" pod="kube-system/kube-controller-manager-ubuntuse>
Dec 13 21:17:18 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-da16c36612417ba0f5ab81357a0a2452cf4ae38b47cb9eab860e2d7bbe0de637-runc.VJwxXb.mount: Deactivated successfully.
Dec 13 21:17:18 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-da16c36612417ba0f5ab81357a0a2452cf4ae38b47cb9eab860e2d7bbe0de637-runc.SXK3sR.mount: Deactivated successfully.
Dec 13 21:17:19 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-56191d51b9dfe7b56c320ba85c897d7deafdf070364777a698782d6685ffe256-runc.KeqqBx.mount: Deactivated successfully.
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.083816298Z" level=info msg="StopPodSandbox for \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\""
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.083907197Z" level=info msg="TearDown network for sandbox \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\" successfully"
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.083949736Z" level=info msg="StopPodSandbox for \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\" returns successfully"
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.084731070Z" level=info msg="RemovePodSandbox for \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\""
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.084757189Z" level=info msg="Forcibly stopping sandbox \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\""
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.084814886Z" level=info msg="TearDown network for sandbox \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\" successfully"
Dec 13 21:17:24 ubuntuserver containerd[672]: time="2022-12-13T21:17:24.088007063Z" level=info msg="RemovePodSandbox \"9a253fdf76b2614b7a8280d8cdcae43ee9e94736fe309982771e7d82b86118cd\" returns successfully"
Dec 13 21:17:25 ubuntuserver kubelet[662]: I1213 21:17:25.349409 662 scope.go:115] "RemoveContainer" containerID="23c70b66c62384bca25329d3a7ab5c24e209cb791339edff70f4016235ea5dea"
Dec 13 21:17:25 ubuntuserver kubelet[662]: E1213 21:17:25.349754 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-lc5nm_kube-system(0f9da167-6b5b-4530-8ae4-067bcfd88098)\"" pod="kube-system/kube-proxy-lc5nm" podUID=0f9da167-6b5b-4530-8ae4-067bcfd88098
Dec 13 21:17:25 ubuntuserver kubelet[662]: I1213 21:17:25.366261 662 scope.go:115] "RemoveContainer" containerID="d90c8c392ef73770f2161ab12a98cdbdc3e3f7937239f8b10763f760d6091201"
Dec 13 21:17:25 ubuntuserver kubelet[662]: E1213 21:17:25.366602 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ubuntuserver_kube-system(f8f5737540cef58793e773c366765eac)\"" pod="kube-system/kube-controller-manager-ubuntuse>
Dec 13 21:17:28 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-da16c36612417ba0f5ab81357a0a2452cf4ae38b47cb9eab860e2d7bbe0de637-runc.UWyAsT.mount: Deactivated successfully.
Dec 13 21:17:29 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-da16c36612417ba0f5ab81357a0a2452cf4ae38b47cb9eab860e2d7bbe0de637-runc.X1afEe.mount: Deactivated successfully.
Dec 13 21:17:36 ubuntuserver kubelet[662]: I1213 21:17:36.366771 662 scope.go:115] "RemoveContainer" containerID="23c70b66c62384bca25329d3a7ab5c24e209cb791339edff70f4016235ea5dea"
Dec 13 21:17:36 ubuntuserver kubelet[662]: E1213 21:17:36.367009 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-proxy\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-lc5nm_kube-system(0f9da167-6b5b-4530-8ae4-067bcfd88098)\"" pod="kube-system/kube-proxy-lc5nm" podUID=0f9da167-6b5b-4530-8ae4-067bcfd88098
Dec 13 21:17:39 ubuntuserver systemd[1]: run-containerd-runc-k8s.io-56191d51b9dfe7b56c320ba85c897d7deafdf070364777a698782d6685ffe256-runc.yA3w8Y.mount: Deactivated successfully.
Dec 13 21:17:40 ubuntuserver kubelet[662]: I1213 21:17:40.362760 662 scope.go:115] "RemoveContainer" containerID="d90c8c392ef73770f2161ab12a98cdbdc3e3f7937239f8b10763f760d6091201"
Dec 13 21:17:40 ubuntuserver kubelet[662]: E1213 21:17:40.363111 662 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-controller-manager\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-controller-manager pod=kube-controller-manager-ubuntuserver_kube-system(f8f5737540cef58793e773c366765eac)\"" pod="kube-system/kube-controller-manager-ubuntuse>
lines 1-30/30 (END)
答案1
使用 kubeadm 和 containerd。kubeadm 重置(包括手动删除 $HOME/.kube/config 和 /etc/cni/net.d)似乎可以帮我解决这个问题。
重置 kubeadm。
$ sudo kubeadm reset
出现提示时,删除此配置文件或稍后覆盖此文件。
$ sudo rm $HOME/.kube/config
清除 CNI 配置(我认为这对我来说是关键部分)。使用风险自负(因为我对里面的内容知之甚少,我还很新),备份这些文件不会有害处!
$ sudo rm -rf /etc/cni/net.d
'Kubeadm init' 现在应该初始化一个工作集群,而不会让 kube-system pods '自发' 陷入 crashLoopBackOff。
$ sudo kubeadm init
答案2
按照此教程在 22.04 上运行良好。我跳过了内核配置和容器配置,这些在 20.04 上不是必需的。