我的 kubernetes 运行在 AWS 之上,我使用 helm 进行模板化,问题是tiller pod
即使集群上没有太多负载,我每隔几个小时就会死机。我从日志中不知道什么
ubuntu@kops:~$ kubectl get pods -owide -n kube-system | grep tiller tiller-deploy-6b985bb7b-88ssk 0/1 CrashLoopBackOff 71 19h 100.96.4.3 ip-172-20-46-194.us-west-2.compute.internal ubuntu@kops:~$ ubuntu@kops:~$ ubuntu@kops:~$ kubectl describe pod tiller-deploy-6b985bb7b-88ssk Error from server (NotFound): pods "tiller-deploy-6b985bb7b-88ssk" not found ubuntu@kops:~$ ubuntu@kops:~$ ubuntu@kops:~$ ubuntu@kops:~$ ubuntu@kops:~$ kubectl logs tiller-deploy-6b985bb7b-88ssk Error from server (NotFound): pods "tiller-deploy-6b985bb7b-88ssk" not found ubuntu@kops:~$
其运行的 EC2 有大量可用内存,CPU 为 8 核负载...
admin@ip-172-20-46-194:~$ free -h total used free shared buffers cached Mem: 31G 5.1G 26G 1.1M 1.5G 2.4G -/+ buffers/cache: 1.2G 30G Swap: 0B 0B 0B
top - 03:08:29 up 19:20, 1 user, load average: 79.51, 78.59, 77.98
Tasks: 176 total, 4 running, 172 sleeping, 0 stopped, 0 zombie
%Cpu(s): 28.2 us, 0.5 sy, 0.1 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 71.3 st
KiB Mem: 32950672 total, 5371456 used, 27579216 free, 1524240 buffers
KiB Swap: 0 total, 0 used, 0 free. 2561276 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12203 root 20 0 906520 16756 1492 S 235.0 0.1 1427:46 docker
23974 root 20 0 906520 16796 1492 S 221.1 0.1 1054:34 docker
24072 root 20 0 906520 16796 1492 S 132.0 0.1 669:17.68 docker
12318 root 20 0 906520 16792 1492 S 130.7 0.1 900:54.95 docker
17543 nobody 20 0 906520 16796 1492 S 29.1 0.1 84:01.18 docker
23865 nobody 20 0 906520 16796 1492 R 15.9 0.1 69:02.31 docker
12112 nobody 20 0 906520 16792 1492 S 14.6 0.1 91:27.51 docker
3013 root 20 0 6753392 124600 50028 S 7.6 0.4 73:59.01 kubelet
6378 nobody 20 0 683644 432120 29772 R 6.0 1.3 21:09.54 prometheus
答案1
为了让您更进一步,您需要添加--namespace kube-system
到描述命令中:
kubectl --namespace kube-system describe pod tiller-deploy-6b985bb7b-88ssk
然后,您可以更有效地根本原因。如果您想尝试删除 pod 以暂时绕过它,则相同。