新的 Kubernetes 集群设置 Calico:“无法 stat() 节点名文件:stat /var/lib/calico/nodename:没有此文件或目录”

新的 Kubernetes 集群设置 Calico:“无法 stat() 节点名文件:stat /var/lib/calico/nodename:没有此文件或目录”

我无法将 CNI 添加到 kubernetes 主节点,CNI 插件无法访问某些文件或文件夹。Calico 和 Flannel 的日志表明某些文件或文件夹无法访问(在文章中我仅提到 Calico)。

我发现版本 v1.19.4 和 v1.19.3 的 kubectl、kubeadm 和 kubelet 存在同样的问题。Docker 版本为 19.03.13-ce,使用 overlay2 和 ext4 文件系统,并使用 systemd 作为 cgroupdriver。Swap 已禁用。

我在 stackoverflow 上找到的唯一符合这个方向的东西是: 带有 Calico 的 Kubernetes 集群 - 容器无法启动且出现 FailedCreatePodSandBox 故障

第一步,我使用 kubeadm(calico 的 CIDR)设置集群:

# kubeadm init --apiserver-advertise-address=192.168.178.33 --pod-network-cidr=192.168.0.0/16

这是可行的,kubelet 日志中显示需要 CNI 的消息。之后,我将应用 CNI calico:

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

等待一段时间后主节点将保持以下状态:

❯ kubectl get pods --all-namespaces                 
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-5c6f6b67db-zdksz   0/1     ContainerCreating   0          7m47s
kube-system   calico-node-sc42z                          0/1     CrashLoopBackOff    5          7m47s
kube-system   coredns-f9fd979d6-4zrcj                    0/1     ContainerCreating   0          8m11s
kube-system   coredns-f9fd979d6-wf9r2                    0/1     ContainerCreating   0          8m11s
kube-system   etcd-hs-0                                  1/1     Running             0          8m20s
kube-system   kube-apiserver-hs-0                        1/1     Running             0          8m20s
kube-system   kube-controller-manager-hs-0               1/1     Running             0          8m20s
kube-system   kube-proxy-t6ngd                           1/1     Running             0          8m11s
kube-system   kube-scheduler-hs-0                        1/1     Running             0          8m20sere

对我来说,我从以下命令获得的信息:

kubectl describe pods calico-node-sc42z --namespace kube-system

与下一个代码不一致:calico-node pod 有一个已安装的卷,但 pod 也无法访问它(查看卷和事件)。

❯ kubectl describe pods calico-node-sc42z --namespace kube-system
Name:                 calico-node-sc42z
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 hs-0/192.168.178.48
Start Time:           Sat, 14 Nov 2020 00:58:36 +0100
Labels:               controller-revision-hash=5f678767
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.178.48
IPs:
  IP:           192.168.178.48
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://29c6cf8b73ecb98ee18169db0f6ffe8b141a8a6e10b2c839fc5bf05177f066ac
    Image:         calico/cni:v3.16.5
    Image ID:      docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:48 +0100
      Finished:     Sat, 14 Nov 2020 00:58:48 +0100
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
  install-cni:
    Container ID:  docker://4435863e0d2f3ab4535aa6ca49ff95d889e71614861f3c7c0e4213d8c333f4db
    Image:         calico/cni:v3.16.5
    Image ID:      docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:49 +0100
      Finished:     Sat, 14 Nov 2020 00:58:49 +0100
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
  flexvol-driver:
    Container ID:   docker://ca03f59013c1576a4a605a6d737af78ec3e859376aa11a301e56f0ffdacbc8db
    Image:          calico/pod2daemon-flexvol:v3.16.5
    Image ID:       docker-pullable://calico/pod2daemon-flexvol@sha256:7b20fd9cc36c7196dd24d56cc1e89ac573c634856ee020334b0b30cf5b8a3d3b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:56 +0100
      Finished:     Sat, 14 Nov 2020 00:58:56 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Containers:
  calico-node:
    Container ID:   docker://96bbc7f4adf1d5cb9a927aedc18e16da7b5ed4b0ff1290179a8dd4a51c115ab8
    Image:          calico/node:v3.16.5
    Image ID:       docker-pullable://calico/node@sha256:43c145b2bd837611d8d41e70631a8f2cc2b97b5ca9d895d66ffddd414dab83c5
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sat, 14 Nov 2020 01:04:51 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 14 Nov 2020 01:03:41 +0100
      Finished:     Sat, 14 Nov 2020 01:04:51 +0100
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGSEVERITYSCREEN:            info
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-tzhr4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-tzhr4
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoScheduleop=Exists
                 :NoExecuteop=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m52s                  default-scheduler  Successfully assigned kube-system/calico-node-sc42z to hs-0
  Normal   Pulling    6m51s                  kubelet            Pulling image "calico/cni:v3.16.5"
  Normal   Pulled     6m40s                  kubelet            Successfully pulled image "calico/cni:v3.16.5" in 10.618669742s
  Normal   Started    6m40s                  kubelet            Started container upgrade-ipam
  Normal   Created    6m40s                  kubelet            Created container upgrade-ipam
  Normal   Created    6m39s                  kubelet            Created container install-cni
  Normal   Pulled     6m39s                  kubelet            Container image "calico/cni:v3.16.5" already present on machine
  Normal   Started    6m39s                  kubelet            Started container install-cni
  Normal   Pulling    6m38s                  kubelet            Pulling image "calico/pod2daemon-flexvol:v3.16.5"
  Normal   Started    6m32s                  kubelet            Started container flexvol-driver
  Normal   Created    6m32s                  kubelet            Created container flexvol-driver
  Normal   Pulled     6m32s                  kubelet            Successfully pulled image "calico/pod2daemon-flexvol:v3.16.5" in 6.076268177s
  Normal   Pulling    6m31s                  kubelet            Pulling image "calico/node:v3.16.5"
  Normal   Pulled     6m19s                  kubelet            Successfully pulled image "calico/node:v3.16.5" in 12.051211859s
  Normal   Created    6m19s                  kubelet            Created container calico-node
  Normal   Started    6m19s                  kubelet            Started container calico-node
  Warning  Unhealthy  5m32s (x5 over 6m12s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Warning  Unhealthy  109s (x23 over 6m9s)   kubelet            Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1

此外,我有 calico-node 的日志,但我不明白如何从这些附加信息中受益:不幸的是,我不知道数据存储是否指的是文件系统,这意味着这是我已经知道的错误或者它是额外的。

❯ kubectl logs calico-node-sc42z -n kube-system -f
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 376: Early log level set to info
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 392: Using NODENAME environment for node name
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 404: Determined node name: hs-0
2020-11-14 01:42:55.539 [INFO][8] startup/startup.go 436: Checking datastore connection
2020-11-14 01:43:25.539 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
2020-11-14 01:43:56.540 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

也许有人可以给我一些提示,告诉我如何解决这个问题,或者在哪里可以阅读有关这个​​主题的文章。你好,Kokos Bot。

答案1

这可能是因为 Calico 的默认 POD CIDR 与 Host CIDR 冲突。我刚刚从您的 中得到了这个印象--apiserver-advertise-address=192.168.178.33。如果是这种情况,值得尝试使用不同的 POD --pod-network-cidr=20.96.0.0/12CIDRkubeadm init

为了再次进行全新安装,最好在上述更改之前执行kubeadm reset。在执行之前,请注意kubeadm reset命令的影响(阅读这里

参考 -https://stackoverflow.com/questions/60742165/kubernetes-calico-replicaset

相关内容