我是 Kubernetes 新手,所以这可能是一个愚蠢的问题。我尝试在工作节点上部署的任何 Pod 都会因 CrashLoopBackOff 错误而崩溃。这只会发生在我的工作节点上运行的 Pod 上,而不会发生在我的领导者节点上运行的 Pod 上。即使在工作节点上创建虚拟 Pod 也会失败。
一些正在运行的pod:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default test-pod 0/1 CrashLoopBackOff 6 (3m44s ago) 178m
kube-system calico-kube-controllers-866dcccff9-6s8vb 1/1 Running 0 40m
kube-system calico-node-bq4xm 0/1 CrashLoopBackOff 9 (4m34s ago) 37m
kube-system calico-node-w7qb5 0/1 Running 0 51m
kube-system kube-proxy-hwwrs 0/1 CrashLoopBackOff 9 (98s ago) 35m
kube-system kube-proxy-vwrwx 1/1 Running 0 3d6h
test-pod
、、是工作节点上运行的 pod。我删除了一些 pod 以强制重新创建calico-node-bq4xm
,kube-proxy-hwwrs
希望能修复此问题。
描述工作节点的事件:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 56m kube-proxy
Normal Starting 54m kube-proxy
Normal Starting 54m kube-proxy
Normal Starting 52m kube-proxy
Normal Starting 50m kube-proxy
Normal Starting 48m kube-proxy
Normal Starting 44m kube-proxy
Normal Starting 38m kube-proxy
Normal Starting 30m kube-proxy
Normal Starting 24m kube-proxy
Normal Starting 18m kube-proxy
Normal Starting 11m kube-proxy
Normal Starting 4m55s kube-proxy
描述 kube-proxy-hwwrs pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 59m default-scheduler Successfully assigned kube-system/kube-proxy-hwwrs to james-k8s
Normal Created 54m (x4 over 59m) kubelet Created container kube-proxy
Normal Started 54m (x4 over 59m) kubelet Started container kube-proxy
Normal SandboxChanged 53m (x4 over 56m) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 14m (x12 over 59m) kubelet Container image "registry.k8s.io/kube-proxy:v1.28.4" already present on machine
Warning BackOff 9m18s (x175 over 56m) kubelet Back-off restarting failed container kube-proxy in pod kube-proxy-hwwrs_kube-system(3cd07dd8-8b2d-4535-a656-8910346c638d)
Normal Killing 3m31s (x13 over 57m) kubelet Stopping container kube-proxy
我检查了可用的磁盘空间和内存,一切看起来都很好。我尝试删除 pod,以便重新创建它,但情况仍然相同。有趣的是,如果我更改标签以强制它在领导者上运行,它会起作用,这让我认为工人出了问题。
我还需要检查其他什么来找出导致错误的原因吗?我的工作节点和领导节点有什么不同?