即使在安装 calico CNI 后,基于 kubeadm 的 kubernetes(v1.24.2)工作节点仍处于“NotReady”状态(“dial unix /var/run/bird/bird.ctl:connect:没有这样的文件或目录”)
我已经在基于 kubeadm 的 kubernetes 集群上部署了 calico CNI,但工作节点仍然具有“NotReady”状态值。
所有节点上的 TCP/IP 端口 179 均已打开。SELinux 未报告任何拒绝情况。
在其中一个工作节点上,kubelet 服务的日志产生下面的输出。
$ journalctl -x -u kubelet.service;
Aug 26 10:32:39 centos7-03-08 kubelet[2063]: I0826 10:32:39.855197 2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.007016 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:44 centos7-03-08 kubelet[2063]: E0826 10:32:44.011224 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:47 centos7-03-08 kubelet[2063]: I0826 10:32:47.172929 2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="ready" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:49 centos7-03-08 kubelet[2063]: E0826 10:32:49.013157 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:54 centos7-03-08 kubelet[2063]: E0826 10:32:54.014957 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:59 centos7-03-08 kubelet[2063]: E0826 10:32:59.016829 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
kubelet 似乎抱怨“BIRD”尚未准备好,如下行所示。
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
“BIRD”从何而来?又该如何解决?
我有一台控制平面虚拟机和三台工作虚拟机。每台虚拟机都有三个网络接口。其中两个处于活动状态,并分配有静态 IP 地址,每台虚拟机一个。
所有四个节点(1 个控制平面节点 + 3 个工作节点)在其 /etc/cni/net.d/ 目录以及 /var/lib/calico/ 目录中都有相同的内容。
$ ssh [email protected] "ls -tlr /etc/cni/net.d/ /var/lib/calico/";date;
/etc/cni/net.d/:
total 8
-rw-r--r--. 1 root root 805 Aug 25 20:36 10-calico.conflist
-rw-------. 1 root root 2718 Aug 25 20:37 calico-kubeconfig
/var/lib/calico/:
total 8
-rw-r--r--. 1 root root 13 Aug 25 20:37 nodename
-rw-r--r--. 1 root root 4 Aug 25 20:37 mtu
$
控制平面节点中的 kubelet 服务有以下日志输出片段
$ journalctl -x -u kubelet.service -f;
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546857 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546952 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546973 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.547921 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548010 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548030 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549112 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549179 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549198 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549414 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549501 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
我按照“https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onprem”上的官方文档,使用以下命令安装了 calico CNI。
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.0/manifests/tigera-operator.yaml
kubectl create -f /tmp/custom-resources.yaml
“/tmp/custom-resources.yaml”的内容如下所示。
---
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
-
blockSize: 26
cidr: 172.22.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
我提供给 kubeadm init 命令 --config 参数的配置文件包含以下部分(这是该文件的缩写版本)
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
dnsDomain: cluster.local
serviceSubnet: 172.21.0.0/16
podSubnet: 172.22.0.0/16
控制平面和工作节点上的“/etc/cni/net.d/10-calico.conflist”内容完全相同。
$ cat /etc/cni/net.d/10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"datastore_type": "kubernetes",
"mtu": 0,
"nodename_file_optional": false,
"log_level": "Info",
"log_file_path": "/var/log/calico/cni/cni.log",
"ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
"container_settings": {
"allow_ip_forwarding": false
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"k8s_api_root":"https://172.21.0.1:443",
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
},
{"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
]
}
我已经在此系统上部署了一个 Pod,但由于工作节点未处于“就绪”状态,因此 Pod 处于待处理状态。以下命令的输出解释了这一点
$ kubectl describe pod/my-nginx -n ns-test-02;
输出结果如下
{ kube_apiserver_node_01="192.168.12.17"; { kubectl --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf describe pod/my-nginx -n ns-test-02 ; }; };
Name: my-nginx
Namespace: ns-test-02
Priority: 0
Node: <none>
Labels: app=nginx
purpose=learning
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
my-nginx:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blxv4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-blxv4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 16m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
注意事件部分
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 16m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
pod/my-nginx 对象是使用以下命令构建的。
{
kube_apiserver_node_01="192.168.12.17";
kubectl \
--kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
create \
namespace ns-test-02 \
;
}
{
kube_apiserver_node_01="192.168.12.17";
kubectl \
--kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
--namespace=ns-test-02 \
run my-nginx \
--image=nginx \
--restart=Never \
--port=80 \
--expose=true \
--labels='purpose=learning,app=nginx' \
;
}
以下是基于 kubeadm 的 kubernetes 集群中的节点、pod、服务对象的列表。
{ kube_apiserver_node_01="192.168.12.17"; { kubectl --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf get nodes,pods,services -A -o wide ; }; };
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/centos7-03-05 Ready control-plane 5h33m v1.24.2 192.168.12.17 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-08 NotReady <none> 5h33m v1.24.2 192.168.12.20 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-09 NotReady <none> 5h32m v1.24.2 192.168.12.21 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-10 NotReady <none> 5h32m v1.24.2 192.168.12.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-apiserver pod/calico-apiserver-658d588b56-bs5j6 1/1 Running 0 4h42m 172.22.147.134 centos7-03-05 <none> <none>
calico-apiserver pod/calico-apiserver-658d588b56-zxhpg 1/1 Running 0 4h42m 172.22.147.133 centos7-03-05 <none> <none>
calico-system pod/calico-kube-controllers-5f44c7d7d7-n7lfd 1/1 Running 2 (4h43m ago) 4h45m 172.22.147.129 centos7-03-05 <none> <none>
calico-system pod/calico-node-bj9f9 1/1 Running 2 (4h42m ago) 4h45m 192.168.12.22 centos7-03-10 <none> <none>
calico-system pod/calico-node-brpjc 1/1 Running 0 4h45m 192.168.12.20 centos7-03-08 <none> <none>
calico-system pod/calico-node-ksqqn 1/1 Running 0 4h45m 192.168.12.17 centos7-03-05 <none> <none>
calico-system pod/calico-node-vpjx7 1/1 Running 3 (4h42m ago) 4h45m 192.168.12.21 centos7-03-09 <none> <none>
calico-system pod/calico-typha-77c99dcb74-76rt4 1/1 Running 0 4h45m 192.168.12.22 centos7-03-10 <none> <none>
calico-system pod/calico-typha-77c99dcb74-qs5x8 1/1 Running 0 4h45m 192.168.12.21 centos7-03-09 <none> <none>
calico-system pod/csi-node-driver-gdr4r 2/2 Running 0 4h44m 172.22.147.131 centos7-03-05 <none> <none>
kube-system pod/coredns-6d4b75cb6d-h4kxp 1/1 Running 0 5h33m 172.22.147.130 centos7-03-05 <none> <none>
kube-system pod/coredns-6d4b75cb6d-n9f9h 1/1 Running 0 5h33m 172.22.147.132 centos7-03-05 <none> <none>
kube-system pod/kube-apiserver-centos7-03-05 1/1 Running 0 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-controller-manager-centos7-03-05 1/1 Running 1 (4h43m ago) 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-proxy-5qfsl 1/1 Running 0 5h32m 192.168.12.22 centos7-03-10 <none> <none>
kube-system pod/kube-proxy-r62r4 1/1 Running 0 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-proxy-t7lnr 1/1 Running 0 5h32m 192.168.12.21 centos7-03-09 <none> <none>
kube-system pod/kube-proxy-v4wjs 1/1 Running 0 5h33m 192.168.12.20 centos7-03-08 <none> <none>
kube-system pod/kube-scheduler-centos7-03-05 1/1 Running 1 (4h43m ago) 5h33m 192.168.12.17 centos7-03-05 <none> <none>
ns-test-02 pod/my-nginx 0/1 Pending 0 36s <none> <none> <none> <none>
tigera-operator pod/tigera-operator-7ff575f7f7-6qhft 1/1 Running 1 (4h43m ago) 4h45m 192.168.12.20 centos7-03-08 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
calico-apiserver service/calico-api ClusterIP 172.21.182.168 <none> 443/TCP 4h42m apiserver=true
calico-system service/calico-kube-controllers-metrics ClusterIP 172.21.46.154 <none> 9094/TCP 4h42m k8s-app=calico-kube-controllers
calico-system service/calico-typha ClusterIP 172.21.208.66 <none> 5473/TCP 4h45m k8s-app=calico-typha
default service/kubernetes ClusterIP 172.21.0.1 <none> 443/TCP 5h33m <none>
kube-system service/kube-dns ClusterIP 172.21.0.10 <none> 53/UDP,53/TCP,9153/TCP 5h33m k8s-app=kube-dns
ns-test-02 service/my-nginx ClusterIP 172.21.208.139 <none> 80/TCP 36s app=nginx,purpose=learning