基于 kubeadm 的 kubernetes（v1.24.2）工作节点在安装 calico CNI 后仍处于“NotReady”状态（“查询 BIRD 时出错”）

2024-6-2 • tag-icon

基于 kubeadm 的 kubernetes（v1.24.2）工作节点在安装 calico CNI 后仍处于“NotReady”状态（“查询 BIRD 时出错”）

即使在安装 calico CNI 后，基于 kubeadm 的 kubernetes（v1.24.2）工作节点仍处于“NotReady”状态（“dial unix /var/run/bird/bird.ctl：connect：没有这样的文件或目录”）

我已经在基于 kubeadm 的 kubernetes 集群上部署了 calico CNI，但工作节点仍然具有“NotReady”状态值。

所有节点上的 TCP/IP 端口 179 均已打开。SELinux 未报告任何拒绝情况。

在其中一个工作节点上，kubelet 服务的日志产生下面的输出。

$ journalctl -x -u kubelet.service;

Aug 26 10:32:39 centos7-03-08 kubelet[2063]: I0826 10:32:39.855197    2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.007016    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:44 centos7-03-08 kubelet[2063]: E0826 10:32:44.011224    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:47 centos7-03-08 kubelet[2063]: I0826 10:32:47.172929    2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="ready" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:49 centos7-03-08 kubelet[2063]: E0826 10:32:49.013157    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:54 centos7-03-08 kubelet[2063]: E0826 10:32:54.014957    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:59 centos7-03-08 kubelet[2063]: E0826 10:32:59.016829    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"

kubelet 似乎抱怨“BIRD”尚未准备好，如下行所示。

Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory

“BIRD”从何而来？又该如何解决？

我有一台控制平面虚拟机和三台工作虚拟机。每台虚拟机都有三个网络接口。其中两个处于活动状态，并分配有静态 IP 地址，每台虚拟机一个。

所有四个节点（1 个控制平面节点 + 3 个工作节点）在其 /etc/cni/net.d/ 目录以及 /var/lib/calico/ 目录中都有相同的内容。

$ ssh [email protected] "ls -tlr /etc/cni/net.d/ /var/lib/calico/";date;
/etc/cni/net.d/:
total 8
-rw-r--r--. 1 root root  805 Aug 25 20:36 10-calico.conflist
-rw-------. 1 root root 2718 Aug 25 20:37 calico-kubeconfig

/var/lib/calico/:
total 8
-rw-r--r--. 1 root root 13 Aug 25 20:37 nodename
-rw-r--r--. 1 root root  4 Aug 25 20:37 mtu

$

控制平面节点中的 kubelet 服务有以下日志输出片段

$ journalctl -x -u kubelet.service -f;

Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546857    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546952    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546973    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.547921    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548010    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548030    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549112    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549179    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549198    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549414    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549501    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running

我按照“https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onprem”上的官方文档，使用以下命令安装了 calico CNI。

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.0/manifests/tigera-operator.yaml
kubectl create -f /tmp/custom-resources.yaml

“/tmp/custom-resources.yaml”的内容如下所示。

---

  # This section includes base Calico installation configuration.
  # For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
  apiVersion: operator.tigera.io/v1
  kind: Installation
  metadata:
    name: default
  spec:
    # Configures Calico networking.
    calicoNetwork:
      # Note: The ipPools section cannot be modified post-install.
      ipPools:
        -
          blockSize: 26
          cidr: 172.22.0.0/16
          encapsulation: VXLANCrossSubnet
          natOutgoing: Enabled
          nodeSelector: all()
  
---
  
  # This section configures the Calico API server.
  # For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
  apiVersion: operator.tigera.io/v1
  kind: APIServer 
  metadata: 
    name: default 
  spec: {}

我提供给 kubeadm init 命令 --config 参数的配置文件包含以下部分（这是该文件的缩写版本）

  apiVersion: kubeadm.k8s.io/v1beta3
  kind: ClusterConfiguration
  networking:
    dnsDomain: cluster.local
    serviceSubnet: 172.21.0.0/16
    podSubnet: 172.22.0.0/16

控制平面和工作节点上的“/etc/cni/net.d/10-calico.conflist”内容完全相同。

$ cat /etc/cni/net.d/10-calico.conflist

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "datastore_type": "kubernetes",
      "mtu": 0,
      "nodename_file_optional": false,
      "log_level": "Info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
      "container_settings": {
          "allow_ip_forwarding": false
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "k8s_api_root":"https://172.21.0.1:443",
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    },
    {"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
  ]
}

我已经在此系统上部署了一个 Pod，但由于工作节点未处于“就绪”状态，因此 Pod 处于待处理状态。以下命令的输出解释了这一点

$ kubectl describe pod/my-nginx -n ns-test-02;

输出结果如下

{             kube_apiserver_node_01="192.168.12.17";             {               kubectl                 --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf                 describe pod/my-nginx              -n ns-test-02               ;             };           };
Name:         my-nginx
Namespace:    ns-test-02
Priority:     0
Node:         <none>
Labels:       app=nginx
              purpose=learning
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  my-nginx:
    Image:        nginx
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blxv4 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-blxv4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  22m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  16m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

注意事件部分

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  22m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  16m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

pod/my-nginx 对象是使用以下命令构建的。

{
  kube_apiserver_node_01="192.168.12.17";
  kubectl \
    --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
    create \
    namespace ns-test-02 \
  ;
}

{
  kube_apiserver_node_01="192.168.12.17";
  kubectl \
    --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
    --namespace=ns-test-02 \
    run my-nginx \
    --image=nginx \
    --restart=Never \
    --port=80 \
    --expose=true \
    --labels='purpose=learning,app=nginx' \
  ;
}

以下是基于 kubeadm 的 kubernetes 集群中的节点、pod、服务对象的列表。

{             kube_apiserver_node_01="192.168.12.17";             {               kubectl                 --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf                 get nodes,pods,services -A                 -o wide               ;             };           };                                                                                                                               
NAME                 STATUS     ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
node/centos7-03-05   Ready      control-plane   5h33m   v1.24.2   192.168.12.17   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-08   NotReady   <none>          5h33m   v1.24.2   192.168.12.20   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-09   NotReady   <none>          5h32m   v1.24.2   192.168.12.21   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-10   NotReady   <none>          5h32m   v1.24.2   192.168.12.22   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2

NAMESPACE          NAME                                           READY   STATUS    RESTARTS        AGE     IP               NODE            NOMINATED NODE   READINESS GATES
calico-apiserver   pod/calico-apiserver-658d588b56-bs5j6          1/1     Running   0               4h42m   172.22.147.134   centos7-03-05   <none>           <none>
calico-apiserver   pod/calico-apiserver-658d588b56-zxhpg          1/1     Running   0               4h42m   172.22.147.133   centos7-03-05   <none>           <none>
calico-system      pod/calico-kube-controllers-5f44c7d7d7-n7lfd   1/1     Running   2 (4h43m ago)   4h45m   172.22.147.129   centos7-03-05   <none>           <none>
calico-system      pod/calico-node-bj9f9                          1/1     Running   2 (4h42m ago)   4h45m   192.168.12.22    centos7-03-10   <none>           <none>
calico-system      pod/calico-node-brpjc                          1/1     Running   0               4h45m   192.168.12.20    centos7-03-08   <none>           <none>
calico-system      pod/calico-node-ksqqn                          1/1     Running   0               4h45m   192.168.12.17    centos7-03-05   <none>           <none>
calico-system      pod/calico-node-vpjx7                          1/1     Running   3 (4h42m ago)   4h45m   192.168.12.21    centos7-03-09   <none>           <none>
calico-system      pod/calico-typha-77c99dcb74-76rt4              1/1     Running   0               4h45m   192.168.12.22    centos7-03-10   <none>           <none>
calico-system      pod/calico-typha-77c99dcb74-qs5x8              1/1     Running   0               4h45m   192.168.12.21    centos7-03-09   <none>           <none>
calico-system      pod/csi-node-driver-gdr4r                      2/2     Running   0               4h44m   172.22.147.131   centos7-03-05   <none>           <none>
kube-system        pod/coredns-6d4b75cb6d-h4kxp                   1/1     Running   0               5h33m   172.22.147.130   centos7-03-05   <none>           <none>
kube-system        pod/coredns-6d4b75cb6d-n9f9h                   1/1     Running   0               5h33m   172.22.147.132   centos7-03-05   <none>           <none>
kube-system        pod/kube-apiserver-centos7-03-05               1/1     Running   0               5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-controller-manager-centos7-03-05      1/1     Running   1 (4h43m ago)   5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-proxy-5qfsl                           1/1     Running   0               5h32m   192.168.12.22    centos7-03-10   <none>           <none>
kube-system        pod/kube-proxy-r62r4                           1/1     Running   0               5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-proxy-t7lnr                           1/1     Running   0               5h32m   192.168.12.21    centos7-03-09   <none>           <none>
kube-system        pod/kube-proxy-v4wjs                           1/1     Running   0               5h33m   192.168.12.20    centos7-03-08   <none>           <none>
kube-system        pod/kube-scheduler-centos7-03-05               1/1     Running   1 (4h43m ago)   5h33m   192.168.12.17    centos7-03-05   <none>           <none>
ns-test-02         pod/my-nginx                                   0/1     Pending   0               36s     <none>           <none>          <none>           <none>
tigera-operator    pod/tigera-operator-7ff575f7f7-6qhft           1/1     Running   1 (4h43m ago)   4h45m   192.168.12.20    centos7-03-08   <none>           <none>

NAMESPACE          NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
calico-apiserver   service/calico-api                        ClusterIP   172.21.182.168   <none>        443/TCP                  4h42m   apiserver=true
calico-system      service/calico-kube-controllers-metrics   ClusterIP   172.21.46.154    <none>        9094/TCP                 4h42m   k8s-app=calico-kube-controllers
calico-system      service/calico-typha                      ClusterIP   172.21.208.66    <none>        5473/TCP                 4h45m   k8s-app=calico-typha
default            service/kubernetes                        ClusterIP   172.21.0.1       <none>        443/TCP                  5h33m   <none>
kube-system        service/kube-dns                          ClusterIP   172.21.0.10      <none>        53/UDP,53/TCP,9153/TCP   5h33m   k8s-app=kube-dns
ns-test-02         service/my-nginx                          ClusterIP   172.21.208.139   <none>        80/TCP                   36s     app=nginx,purpose=learning

相关内容