带有外部 kube 节点的 Google Compute Engine (GCE) 中自管理 Kubernetes 集群中的 Pod 卡在容器创建状态

带有外部 kube 节点的 Google Compute Engine (GCE) 中自管理 Kubernetes 集群中的 Pod 卡在容器创建状态

我家里有一个 Kubernetes 集群,其中有 5 个节点、4 个 Google 计算引擎虚拟机(一个控制器节点和 3 个工作节点)和一台裸机本地机器(kube 工作节点)。集群已启动并正在运行,所有节点都处于就绪状态。

  1. 基于以下配置的自管理集群:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
  2. 为所有 IP(0.0.0.0/0)和任何端口添加了 Ingress 和 Engress 的防火墙规则。
  3. 我使用 **--control-plane-endpoint IP:PORT ** 标签来宣传 kube 主节点,以获取主节点的公共 IP,并在此基础上加入工作节点。

问题:我在部署应用程序时遇到问题,本地工作节点中的所有 pod 都停留在 ContainerCreating 状态,而 GCE VM 工作节点上的容器正在部署。有人知道这个设置有什么问题吗?我该如何解决这个问题?

  • 这是我的一个 Pod 的事件输出kubect 描述 pod输出:

事件:已成功将 social-network/home-timeline-redis-6f4c5d55fc-tql2l 分配给 volatile

Warning  FailedCreatePodSandBox  3m14s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Warning  FailedCreatePodSandBox  102s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
  
Normal   SandboxChanged          11s (x3 over 3m14s)  kubelet  Pod sandbox changed, it will be killed and re-created.
 
Warning  FailedCreatePodSandBox  11s                  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?

更新

我重置了所有节点上的 kubeadm,删除了 cilium,然后重新创建了 calico cni。我还将容器更改CDIRsudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443",似乎解决了与主机 CDIR 的冲突。但 Volatile(本地机器)中的 pod 仍然卡在 ContainerCreating 中:

`

>root@controller:~# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS             RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc   1/1     Running            0          12m   20.109.133.129   worker-2     <none>           <none>
calico-node-frv5r                          1/1     Running            0          12m   10.240.0.11      controller   <none>           <none>
calico-node-lplx6                          1/1     Running            0          12m   10.240.0.20      worker-0     <none>           <none>
calico-node-lwrdr                          1/1     Running            0          12m   10.240.0.21      worker-1     <none>           <none>
calico-node-ppczn                          0/1     CrashLoopBackOff   7          12m   130.239.41.206   volatile     <none>           <none>
calico-node-zplwx                          1/1     Running            0          12m   10.240.0.22      worker-2     <none>           <none>
coredns-74ff55c5b-69mn2                    1/1     Running            0          14m   20.105.55.194    controller   <none>           <none>
coredns-74ff55c5b-djczf                    1/1     Running            0          14m   20.105.55.193    controller   <none>           <none>
etcd-controller                            1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-apiserver-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-controller-manager-controller         1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-5vzdf                           1/1     Running            0          13m   10.240.0.20      worker-0     <none>           <none>
kube-proxy-d22q4                           1/1     Running            0          13m   10.240.0.22      worker-2     <none>           <none>
kube-proxy-hml5c                           1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>
kube-proxy-hw8kl                           1/1     Running            0          13m   10.240.0.21      worker-1     <none>           <none>
kube-proxy-zb6t7                           1/1     Running            0          13m   130.239.41.206   volatile     <none>           <none>
kube-scheduler-controller                  1/1     Running            0          14m   10.240.0.11      controller   <none>           <none>

root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system

   > root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name:                 calico-node-ppczn
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 volatile/130.239.41.206
Start Time:           Mon, 04 Jan 2021 13:01:36 +0000
Labels:               controller-revision-hash=89c447898
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   130.239.41.206
IPs:
  IP:           130.239.41.206
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:37 +0000
      Finished:     Mon, 04 Jan 2021 13:01:38 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  install-cni:
    Container ID:  docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
    Image:         docker.io/calico/cni:v3.17.1
    Image ID:      docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:39 +0000
      Finished:     Mon, 04 Jan 2021 13:01:41 +0000
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
  flexvol-driver:
    Container ID:   docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
    Image:          docker.io/calico/pod2daemon-flexvol:v3.17.1
    Image ID:       docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 04 Jan 2021 13:01:43 +0000
      Finished:     Mon, 04 Jan 2021 13:01:43 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
  calico-node:
    Container ID:   docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
    Image:          docker.io/calico/node:v3.17.1
    Image ID:       docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Mon, 04 Jan 2021 13:18:48 +0000
      Finished:     Mon, 04 Jan 2021 13:19:57 +0000
    Ready:          False
    Restart Count:  9
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGSEVERITYSCREEN:            info
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-8r94c:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-8r94c
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Normal   Scheduled         22m                  default-scheduler  Successfully assigned kube-system/calico-node-ppczn to volatile
  Normal   Pulled            22m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Created           22m                  kubelet            Created container upgrade-ipam
  Normal   Started           22m                  kubelet            Started container upgrade-ipam
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/cni:v3.17.1" already present on machine
  Normal   Started           21m                  kubelet            Started container install-cni
  Normal   Created           21m                  kubelet            Created container install-cni
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container flexvol-driver
  Normal   Started           21m                  kubelet            Started container flexvol-driver
  Normal   Pulled            21m                  kubelet            Container image "docker.io/calico/node:v3.17.1" already present on machine
  Normal   Created           21m                  kubelet            Created container calico-node
  Normal   Started           21m                  kubelet            Started container calico-node
  Warning  Unhealthy         21m (x2 over 21m)    kubelet            Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
  Warning  Unhealthy         11m (x51 over 21m)   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Warning  DNSConfigForming  115s (x78 over 22m)  kubelet            Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2

calico-node-ppczn 日志:

> root@controller:~# kubectl logs  calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

在本地机器上:

 > root@volatile:~# docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
39efaf54f558        k8s.gcr.io/pause:3.2   "/pause"                 19 minutes ago      Up 19 minutes                           k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5        e3f6fcd87756           "/usr/local/bin/kube…"   20 minutes ago      Up 20 minutes                           k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0        k8s.gcr.io/pause:3.2   "/pause"                 20 minutes ago      Up 20 minutes                           k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737       1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027       1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536       1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615       1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797       1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080       1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289       1 config.go:315] Starting service config controller
I0104 13:00:51.163300       1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304       1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469       1 shared_informer.go:247] Caches are synced for endpoint slice config 
I0104 13:00:51.263487       1 shared_informer.go:247] Caches are synced for service config 
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist  calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~# 

答案1

在主机上,volatile您似乎已在 中配置了 cilium /etc/cni/net.d/*.conf。它是一个网络插件,是 Kubernetes 的众多插件之一。其中一个文件可能包含以下内容:

{
    "name": "cilium",
    "type": "cilium-cni"
}

如果这是意外,请删除该文件。您似乎已经在运行 Project Calico 的竞争网络插件,这似乎已经足够了。因此,在命名空间中重新创建 pod calico-kube-controllers kube-system,让它成功,然后重新创建其他 pod。

如果你打算在该主机上使用 Cilium,请返回Cilium 安装指南。如果您重新执行此操作,您可能会看到已为您创建了 /var/run/cilium/cilium.sock。

相关内容