CentOS 7 上的 Kubernetes 集群使用 kubeadm 1.24 - calico => coredns 卡在 ContainerCreating 中

2024-6-1 • tag-icon

CentOS 7 上的 Kubernetes 集群使用 kubeadm 1.24 - calico => coredns 卡在 ContainerCreating 中

为了在 centos7 上使用 containerd 和 calico 安装主 kubernetes 节点：

我遵循以下步骤：https://computingforgeeks.com/install-kubernetes-cluster-on-centos-with-kubeadm/

在 kubeadm init --pod-network-cidr=192.168.0.0/16 --upload-certs 之后

我使用以下命令安装了 calico：

kubectl 创建 -fhttps://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl 创建 -fhttps://docs.projectcalico.org/manifests/custom-resources.yaml

（代理不允许我以此方式运行，因此我首先下载了文件，然后他们在文件上进行了创建）

然后安装成功了，但是 :coredns 和 calico-kube-controllers 卡在了容器创建

此安装使用公司 DNS 和代理，我已经卡在这里好几天了，无法找出为什么 coredns 卡在 ContainerCreating 中

 [root@master-node system]# kubectl get pod -A
   NAMESPACE         NAME                                       READY   STATUS              RESTARTS        AGE
        calico-system     calico-kube-controllers-68884f975d-6qm5l   0/1     Terminating         0               16d
        calico-system     calico-kube-controllers-68884f975d-ckr2g   0/1     ContainerCreating   0               154m
        calico-system     calico-node-5n4nj                          0/1     Running             7 (165m ago)    16d
        calico-system     calico-node-gp6d5                          0/1     Running             1 (15d ago)     16d
        calico-system     calico-typha-77b6fb6f86-zc8jn              1/1     Running             7 (165m ago)    16d
        kube-system       coredns-6d4b75cb6d-2tqk9                   0/1     ContainerCreating   0               4h46m
        kube-system       coredns-6d4b75cb6d-9dn5d                   0/1     ContainerCreating   0               6h58m
        kube-system       coredns-6d4b75cb6d-vfchn                   0/1     Terminating         32              15d
        kube-system       etcd-master-node                           1/1     Running             14 (165m ago)   16d
        kube-system       kube-apiserver-master-node                 1/1     Running             8 (165m ago)    16d
        kube-system       kube-controller-manager-master-node        1/1     Running             7 (165m ago)    16d
        kube-system       kube-proxy-c6l9s                           1/1     Running             7 (165m ago)    16d
        kube-system       kube-proxy-pqrf8                           1/1     Running             1 (15d ago)     16d
        kube-system       kube-scheduler-master-node                 1/1     Running             8 (165m ago)    16d
        tigera-operator   tigera-operator-5fb55776df-955dj           1/1     Running             13 (164m ago)   16d

kubectl 描述 pod coredns

[root@master-node system]# kubectl describe pod coredns-6d4b75cb6d-2tqk9  -n kube-system
Name:                 coredns-6d4b75cb6d-2tqk9
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master-node/10.32.67.20
Start Time:           Wed, 08 Jun 2022 11:59:59 +0200
Labels:               k8s-app=kube-dns
                      pod-template-hash=6d4b75cb6d
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-6d4b75cb6d
Containers:
  coredns:
    Container ID:
    Image:         k8s.gcr.io/coredns/coredns:v1.8.6
    Image ID:
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ch9xq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-ch9xq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From     Message
  ----     ------                  ----                  ----     -------
  Warning  FailedCreatePodSandBox  114s (x65 over 143m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de60ae0a286ad648a9691065e68fe03589b18a26adfafff0c089d5774b46c163": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable

kubectl 获取事件 --all-namespaces --sort-by='.metadata.creationTimestamp'

[root@master-node system]# kubectl get events --all-namespaces  --sort-by='.metadata.creationTimestamp'
NAMESPACE       LAST SEEN   TYPE      REASON                   OBJECT                                         MESSAGE
calico-system   5m52s       Warning   Unhealthy                pod/calico-node-gp6d5                          (combined from similar events): Readiness probe failed: 2022-06-08 14:50:45.231 [INFO][30872] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system   4m16s       Warning   FailedKillPod            pod/calico-kube-controllers-68884f975d-6qm5l   error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
kube-system     87s         Warning   FailedCreatePodSandBox   pod/coredns-6d4b75cb6d-9dn5d                   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "acd785aa916d2c97aa16ceeaa2f04e7967a1224cb437e50770f32a02b5a9ed3f": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
calico-system   13m         Warning   FailedKillPod            pod/calico-kube-controllers-68884f975d-6qm5l   error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"
kube-system     4m6s        Warning   FailedKillPod            pod/coredns-6d4b75cb6d-vfchn                   error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
calico-system   6s          Warning   Unhealthy                pod/calico-node-5n4nj                          (combined from similar events): Readiness probe failed: 2022-06-08 14:56:31.871 [INFO][17966] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system   45m         Warning   DNSConfigForming         pod/calico-kube-controllers-68884f975d-ckr2g   Search Line limits were exceeded, some search paths have been omitted, the applied search line is: calico-system.svc.cluster.local svc.cluster.local cluster.local XXXXXX.com cs.XXXXX.com fr.XXXXXX.com
kube-system     2m49s       Warning   FailedCreatePodSandBox   pod/coredns-6d4b75cb6d-2tqk9                   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "529139e14dbb8c5917c72428600c5a8333aa21bf249face90048d1b344da5d9a": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
calico-system   3m42s       Warning   FailedCreatePodSandBox   pod/calico-kube-controllers-68884f975d-ckr2g   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "45dd6ebfb53fd745b1ca41853bb7744e407b3439111a946b007752eb8f8f7abd": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
kube-system     9m6s        Warning   FailedKillPod            pod/coredns-6d4b75cb6d-vfchn                   error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"

calico-node 日志为：

(resync-filter-v4,resync-raw-v4) 2022-06-08 18:26:42.665 [INFO][69] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m2.6s: avg=3ms longest=6ms (resync-nat-v4) 2022-06-08 18:27:46.076 [INFO][69] felix/summary.go 100: Summarising 7 dataplane reconciliation loops over 1m3.4s: avg=2ms longest=4ms (resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,resync-wg)

印花香蒲：

2022-06-08 17:34:49.625 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/assignment/" error=too old resource version: 190422 (3180569) 2022-06-08 17:34:50.121 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/" 2022-06-08 18:10:27.377 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 190388 (3180569) 2022-06-08 18:10:27.874 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/"

答案1

我通过这种方式解决了这个问题：

通过在这两个文件的 no_proxy 中添加这些 IP 范围：

10.96.0.0/24（Kubernetes API）
192.168.0.0/16（CIDR 校准）
10.xx0（集群节点）

在：/etc/环境：

HTTP_PROXY=http://myproxy-XXXXXXXX.com:8080
HTTPS_PROXY=http://myproxy-XXXXXXXX.com:8080
NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27   
http_proxy=http://myproxy-XXXXXXXX.com:8080
https_proxy=http://myproxy-XXXXXXXX.com:8080
no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27

然后：

source environement

在：/etc/systemd/system/containerd.service.d/http_proxy.conf

[Service]
Environment="HTTP_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="HTTPS_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"
Environment="http_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="https_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"

然后：

systemctl daemon-reload
systemctl restart containerd

我还对该文件进行了如下编辑：

kubectl -n kube-system edit cm coredns

抑制：max_concurrent 1000\n
将“代理”替换为“转发”

然后我在 Pod 上错误地“kubectl delete”然后所有 Pod 都运行正常

希望能帮助到你

答案1

相关内容