我无法将 CNI 添加到 kubernetes 主节点,CNI 插件无法访问某些文件或文件夹。Calico 和 Flannel 的日志表明某些文件或文件夹无法访问(在文章中我仅提到 Calico)。
我发现版本 v1.19.4 和 v1.19.3 的 kubectl、kubeadm 和 kubelet 存在同样的问题。Docker 版本为 19.03.13-ce,使用 overlay2 和 ext4 文件系统,并使用 systemd 作为 cgroupdriver。Swap 已禁用。
我在 stackoverflow 上找到的唯一符合这个方向的东西是: 带有 Calico 的 Kubernetes 集群 - 容器无法启动且出现 FailedCreatePodSandBox 故障
第一步,我使用 kubeadm(calico 的 CIDR)设置集群:
# kubeadm init --apiserver-advertise-address=192.168.178.33 --pod-network-cidr=192.168.0.0/16
这是可行的,kubelet 日志中显示需要 CNI 的消息。之后,我将应用 CNI calico:
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
等待一段时间后主节点将保持以下状态:
❯ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5c6f6b67db-zdksz 0/1 ContainerCreating 0 7m47s
kube-system calico-node-sc42z 0/1 CrashLoopBackOff 5 7m47s
kube-system coredns-f9fd979d6-4zrcj 0/1 ContainerCreating 0 8m11s
kube-system coredns-f9fd979d6-wf9r2 0/1 ContainerCreating 0 8m11s
kube-system etcd-hs-0 1/1 Running 0 8m20s
kube-system kube-apiserver-hs-0 1/1 Running 0 8m20s
kube-system kube-controller-manager-hs-0 1/1 Running 0 8m20s
kube-system kube-proxy-t6ngd 1/1 Running 0 8m11s
kube-system kube-scheduler-hs-0 1/1 Running 0 8m20sere
对我来说,我从以下命令获得的信息:
kubectl describe pods calico-node-sc42z --namespace kube-system
与下一个代码不一致:calico-node pod 有一个已安装的卷,但 pod 也无法访问它(查看卷和事件)。
❯ kubectl describe pods calico-node-sc42z --namespace kube-system
Name: calico-node-sc42z
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: hs-0/192.168.178.48
Start Time: Sat, 14 Nov 2020 00:58:36 +0100
Labels: controller-revision-hash=5f678767
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.178.48
IPs:
IP: 192.168.178.48
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://29c6cf8b73ecb98ee18169db0f6ffe8b141a8a6e10b2c839fc5bf05177f066ac
Image: calico/cni:v3.16.5
Image ID: docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 14 Nov 2020 00:58:48 +0100
Finished: Sat, 14 Nov 2020 00:58:48 +0100
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
install-cni:
Container ID: docker://4435863e0d2f3ab4535aa6ca49ff95d889e71614861f3c7c0e4213d8c333f4db
Image: calico/cni:v3.16.5
Image ID: docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 14 Nov 2020 00:58:49 +0100
Finished: Sat, 14 Nov 2020 00:58:49 +0100
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
flexvol-driver:
Container ID: docker://ca03f59013c1576a4a605a6d737af78ec3e859376aa11a301e56f0ffdacbc8db
Image: calico/pod2daemon-flexvol:v3.16.5
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:7b20fd9cc36c7196dd24d56cc1e89ac573c634856ee020334b0b30cf5b8a3d3b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 14 Nov 2020 00:58:56 +0100
Finished: Sat, 14 Nov 2020 00:58:56 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Containers:
calico-node:
Container ID: docker://96bbc7f4adf1d5cb9a927aedc18e16da7b5ed4b0ff1290179a8dd4a51c115ab8
Image: calico/node:v3.16.5
Image ID: docker-pullable://calico/node@sha256:43c145b2bd837611d8d41e70631a8f2cc2b97b5ca9d895d66ffddd414dab83c5
Port: <none>
Host Port: <none>
State: Running
Started: Sat, 14 Nov 2020 01:04:51 +0100
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sat, 14 Nov 2020 01:03:41 +0100
Finished: Sat, 14 Nov 2020 01:04:51 +0100
Ready: False
Restart Count: 5
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/ from sysfs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sysfs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-tzhr4:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-tzhr4
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoScheduleop=Exists
:NoExecuteop=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m52s default-scheduler Successfully assigned kube-system/calico-node-sc42z to hs-0
Normal Pulling 6m51s kubelet Pulling image "calico/cni:v3.16.5"
Normal Pulled 6m40s kubelet Successfully pulled image "calico/cni:v3.16.5" in 10.618669742s
Normal Started 6m40s kubelet Started container upgrade-ipam
Normal Created 6m40s kubelet Created container upgrade-ipam
Normal Created 6m39s kubelet Created container install-cni
Normal Pulled 6m39s kubelet Container image "calico/cni:v3.16.5" already present on machine
Normal Started 6m39s kubelet Started container install-cni
Normal Pulling 6m38s kubelet Pulling image "calico/pod2daemon-flexvol:v3.16.5"
Normal Started 6m32s kubelet Started container flexvol-driver
Normal Created 6m32s kubelet Created container flexvol-driver
Normal Pulled 6m32s kubelet Successfully pulled image "calico/pod2daemon-flexvol:v3.16.5" in 6.076268177s
Normal Pulling 6m31s kubelet Pulling image "calico/node:v3.16.5"
Normal Pulled 6m19s kubelet Successfully pulled image "calico/node:v3.16.5" in 12.051211859s
Normal Created 6m19s kubelet Created container calico-node
Normal Started 6m19s kubelet Started container calico-node
Warning Unhealthy 5m32s (x5 over 6m12s) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning Unhealthy 109s (x23 over 6m9s) kubelet Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1
此外,我有 calico-node 的日志,但我不明白如何从这些附加信息中受益:不幸的是,我不知道数据存储是否指的是文件系统,这意味着这是我已经知道的错误或者它是额外的。
❯ kubectl logs calico-node-sc42z -n kube-system -f
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 376: Early log level set to info
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 392: Using NODENAME environment for node name
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 404: Determined node name: hs-0
2020-11-14 01:42:55.539 [INFO][8] startup/startup.go 436: Checking datastore connection
2020-11-14 01:43:25.539 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
2020-11-14 01:43:56.540 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
也许有人可以给我一些提示,告诉我如何解决这个问题,或者在哪里可以阅读有关这个主题的文章。你好,Kokos Bot。
答案1
这可能是因为 Calico 的默认 POD CIDR 与 Host CIDR 冲突。我刚刚从您的 中得到了这个印象--apiserver-advertise-address=192.168.178.33
。如果是这种情况,值得尝试使用不同的 POD --pod-network-cidr=20.96.0.0/12
CIDRkubeadm init
为了再次进行全新安装,最好在上述更改之前执行kubeadm reset
。在执行之前,请注意kubeadm reset
命令的影响(阅读这里)
参考 -https://stackoverflow.com/questions/60742165/kubernetes-calico-replicaset