Pod 处于运行状态但在 Pod 的事件中:警告不健康的 kubelet 就绪探测失败:命令“/bin/bash -c /ready-probe.sh”超时

Pod 处于运行状态但在 Pod 的事件中:警告不健康的 kubelet 就绪探测失败:命令“/bin/bash -c /ready-probe.sh”超时

对于正在运行的 Pod 和事件来说,这意味着什么:

Warning Unhealthy kubelet Readiness probe failed: command "/bin/bash -c /ready-probe.sh" timed out? :

root@k8s-eu-1-master:~# kubectl describe pod cassandra-0
Name:             cassandra-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             k8s-eu-1-worker-1/xx.xxx.xxx.xxx
Start Time:       Tue, 07 Nov 2023 19:18:49 +0100
Labels:           app=cassandra
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=cassandra-58c99f489d
                  statefulset.kubernetes.io/pod-name=cassandra-0
Annotations:      cni.projectcalico.org/containerID: ee11d6b9b5dfade09500ccf53d2d1e4e04aaf479c4502d76f6ce0044c6683ac4
                  cni.projectcalico.org/podIP: 192.168.200.12/32
                  cni.projectcalico.org/podIPs: 192.168.200.12/32
Status:           Running
IP:               192.168.200.12
IPs:
  IP:           192.168.200.12
Controlled By:  StatefulSet/cassandra
Containers:
  cassandra:
    Container ID:   containerd://1386bc65f0f9c11eb9351435578c37efb7081fbbf0acd7a9b2ab6d3507576e0f
    Image:          gcr.io/google-samples/cassandra:v13
    Image ID:       gcr.io/google-samples/cassandra@sha256:7a3d20afa0a46ed073a5c587b4f37e21fa860e83c60b9c42fec1e1e739d64007
    Ports:          7000/TCP, 7001/TCP, 7199/TCP, 9042/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 07 Nov 2023 19:18:51 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:      500m
      memory:   1Gi
    Readiness:  exec [/bin/bash -c /ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Environment:
      MAX_HEAP_SIZE:           512M
      HEAP_NEWSIZE:            100M
      CASSANDRA_SEEDS:         cassandra-0.cassandra.default.svc.cluster.local
      CASSANDRA_CLUSTER_NAME:  K8Demo
      CASSANDRA_DC:            DC1-K8Demo
      CASSANDRA_RACK:          Rack1-K8Demo
      POD_IP:                   (v1:status.podIP)
    Mounts:
      /srv/shared-k8s-eu-1-worker-1 from k8s-eu-1-worker-1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nzb6p (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  k8s-eu-1-worker-1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  k8s-eu-1-worker-1-cassandra-0
    ReadOnly:   false
  kube-api-access-nzb6p:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Guaranteed
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age    From               Message
  ----     ------     ----   ----               -------
  Normal   Scheduled  7m28s  default-scheduler  Successfully assigned default/cassandra-0 to k8s-eu-1-worker-1
  Normal   Pulling    7m28s  kubelet            Pulling image "gcr.io/google-samples/cassandra:v13"
  Normal   Pulled     7m28s  kubelet            Successfully pulled image "gcr.io/google-samples/cassandra:v13" in 383ms (383ms including waiting)
  Normal   Created    7m28s  kubelet            Created container cassandra
  Normal   Started    7m27s  kubelet            Started container cassandra
  Warning  Unhealthy  7m     kubelet            Readiness probe failed: command "/bin/bash -c /ready-probe.sh" timed out // <-------------------

答案1

当 Pod 中的容器未通过配置的就绪探测检查时,就会发生就绪错误。此错误表示容器尚未准备好接收和处理传入的流量或请求。当 Kubernetes 检测到此故障时,它会停止将流量路由到有问题的容器,并等待它通过就绪探测检查准备就绪。

导致此失败的一些可能原因包括:

  • 探测脚本未执行。

  • 探测脚本已执行,但失败。

  • 探测脚本已执行,但需要很长时间才能完成。

附上参考资料文档包括如何解决就绪探测失败错误的步骤。

相关内容