我家里有一个 Kubernetes 集群,其中有 5 个节点、4 个 Google 计算引擎虚拟机(一个控制器节点和 3 个工作节点)和一台裸机本地机器(kube 工作节点)。集群已启动并正在运行,所有节点都处于就绪状态。
- 基于以下配置的自管理集群:https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce
- 为所有 IP(0.0.0.0/0)和任何端口添加了 Ingress 和 Engress 的防火墙规则。
- 我使用 **--control-plane-endpoint IP:PORT ** 标签来宣传 kube 主节点,以获取主节点的公共 IP,并在此基础上加入工作节点。
问题:我在部署应用程序时遇到问题,本地工作节点中的所有 pod 都停留在 ContainerCreating 状态,而 GCE VM 工作节点上的容器正在部署。有人知道这个设置有什么问题吗?我该如何解决这个问题?
- 这是我的一个 Pod 的事件输出kubect 描述 pod输出:
事件:已成功将 social-network/home-timeline-redis-6f4c5d55fc-tql2l 分配给 volatile
Warning FailedCreatePodSandBox 3m14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32b64e6efcaff6401b7b0a6936f005a00a53c19a2061b0a14906b8bc3a81bf20" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Warning FailedCreatePodSandBox 102s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e95fa10d49abf5edc8693345256b91e88c31d1b6414761de80e6038cd7696a4" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Normal SandboxChanged 11s (x3 over 3m14s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 11s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8f5959966e4c25f94bd49b82e1fa6da33a114b1680eae8898ba6685f22e7d37f" network for pod "home-timeline-redis-6f4c5d55fc-tql2l": networkPlugin cni failed to set up pod "home-timeline-redis-6f4c5d55fc-tql2l_social-network" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get "http:///var/run/cilium/cilium.sock/v1/config": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
更新
我重置了所有节点上的 kubeadm,删除了 cilium,然后重新创建了 calico cni。我还将容器更改CDIR
为sudo kubeadm init --pod-network-cidr=20.96.0.0/12 --control-plane-endpoint "34.89.7.120:6443"
,似乎解决了与主机 CDIR 的冲突。但 Volatile(本地机器)中的 pod 仍然卡在 ContainerCreating 中:
`
>root@controller:~# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-744cfdf676-bh2nc 1/1 Running 0 12m 20.109.133.129 worker-2 <none> <none>
calico-node-frv5r 1/1 Running 0 12m 10.240.0.11 controller <none> <none>
calico-node-lplx6 1/1 Running 0 12m 10.240.0.20 worker-0 <none> <none>
calico-node-lwrdr 1/1 Running 0 12m 10.240.0.21 worker-1 <none> <none>
calico-node-ppczn 0/1 CrashLoopBackOff 7 12m 130.239.41.206 volatile <none> <none>
calico-node-zplwx 1/1 Running 0 12m 10.240.0.22 worker-2 <none> <none>
coredns-74ff55c5b-69mn2 1/1 Running 0 14m 20.105.55.194 controller <none> <none>
coredns-74ff55c5b-djczf 1/1 Running 0 14m 20.105.55.193 controller <none> <none>
etcd-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-apiserver-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-controller-manager-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-5vzdf 1/1 Running 0 13m 10.240.0.20 worker-0 <none> <none>
kube-proxy-d22q4 1/1 Running 0 13m 10.240.0.22 worker-2 <none> <none>
kube-proxy-hml5c 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
kube-proxy-hw8kl 1/1 Running 0 13m 10.240.0.21 worker-1 <none> <none>
kube-proxy-zb6t7 1/1 Running 0 13m 130.239.41.206 volatile <none> <none>
kube-scheduler-controller 1/1 Running 0 14m 10.240.0.11 controller <none> <none>
root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
:
> root@controller:~# kubectl describe pod calico-node-ppczn -n kube-system
Name: calico-node-ppczn
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: volatile/130.239.41.206
Start Time: Mon, 04 Jan 2021 13:01:36 +0000
Labels: controller-revision-hash=89c447898
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 130.239.41.206
IPs:
IP: 130.239.41.206
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://27f988847a484c5f74e000c4b8f473895b71ed49f27e0bf4fab4b425940951dc
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:37 +0000
Finished: Mon, 04 Jan 2021 13:01:38 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
install-cni:
Container ID: docker://5629f6984cfe545864d187112a0c1f65e7bdb7dbfae9b4971579f420ab55b77b
Image: docker.io/calico/cni:v3.17.1
Image ID: docker-pullable://calico/cni@sha256:3dc2506632843491864ce73a6e73d5bba7d0dc25ec0df00c1baa91d17549b068
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/install
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:39 +0000
Finished: Mon, 04 Jan 2021 13:01:41 +0000
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
flexvol-driver:
Container ID: docker://3a4bf307a347926893aeb956717d84049af601fd4cc4aa7add6e182c85dc4e7c
Image: docker.io/calico/pod2daemon-flexvol:v3.17.1
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:48f277d41c35dae051d7dd6f0ec8f64ac7ee6650e27102a41b0203a0c2ce6c6b
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 04 Jan 2021 13:01:43 +0000
Finished: Mon, 04 Jan 2021 13:01:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Containers:
calico-node:
Container ID: docker://2576b2426c2a3fc4b6a972839a94872160c7ac5efa5b1159817be8d4ad4ddf60
Image: docker.io/calico/node:v3.17.1
Image ID: docker-pullable://calico/node@sha256:25e0b0495c0df3a7a06b6f9e92203c53e5b56c143ac1c885885ee84bf86285ff
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 04 Jan 2021 13:18:48 +0000
Finished: Mon, 04 Jan 2021 13:19:57 +0000
Ready: False
Restart Count: 9
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/sys/fs/ from sysfs (rw)
/var/lib/calico from var-lib-calico (rw)
/var/log/calico/cni from cni-log-dir (ro)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-8r94c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
sysfs:
Type: HostPath (bare host directory volume)
Path: /sys/fs/
HostPathType: DirectoryOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
cni-log-dir:
Type: HostPath (bare host directory volume)
Path: /var/log/calico/cni
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-8r94c:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-8r94c
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned kube-system/calico-node-ppczn to volatile
Normal Pulled 22m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Created 22m kubelet Created container upgrade-ipam
Normal Started 22m kubelet Started container upgrade-ipam
Normal Pulled 21m kubelet Container image "docker.io/calico/cni:v3.17.1" already present on machine
Normal Started 21m kubelet Started container install-cni
Normal Created 21m kubelet Created container install-cni
Normal Pulled 21m kubelet Container image "docker.io/calico/pod2daemon-flexvol:v3.17.1" already present on machine
Normal Created 21m kubelet Created container flexvol-driver
Normal Started 21m kubelet Started container flexvol-driver
Normal Pulled 21m kubelet Container image "docker.io/calico/node:v3.17.1" already present on machine
Normal Created 21m kubelet Created container calico-node
Normal Started 21m kubelet Started container calico-node
Warning Unhealthy 21m (x2 over 21m) kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
Warning Unhealthy 11m (x51 over 21m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning DNSConfigForming 115s (x78 over 22m) kubelet Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 130.239.40.2 130.239.40.3 2001:6b0:e:4040::2
calico-node-ppczn 日志:
> root@controller:~# kubectl logs calico-node-ppczn -n kube-system
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 379: Early log level set to info
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 395: Using NODENAME environment for node name
2021-01-04 13:17:38.010 [INFO][8] startup/startup.go 407: Determined node name: volatile
2021-01-04 13:17:38.011 [INFO][8] startup/startup.go 439: Checking datastore connection
2021-01-04 13:18:08.011 [INFO][8] startup/startup.go 454: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
在本地机器上:
> root@volatile:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
39efaf54f558 k8s.gcr.io/pause:3.2 "/pause" 19 minutes ago Up 19 minutes k8s_POD_calico-node-ppczn_kube-system_7e98eb90-f581-4dbc-b877-da25bc2868f9_0
05bd9fa182e5 e3f6fcd87756 "/usr/local/bin/kube…" 20 minutes ago Up 20 minutes k8s_kube-proxy_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
ae11c77897b0 k8s.gcr.io/pause:3.2 "/pause" 20 minutes ago Up 20 minutes k8s_POD_kube-proxy-zb6t7_kube-system_90529aeb-d226-4061-a87f-d5b303207a2f_0
> root@volatile:~# docker logs 39efaf54f558
> root@volatile:~# docker logs 05bd9fa182e5
I0104 13:00:51.131737 1 node.go:172] Successfully retrieved node IP: 130.239.41.206
I0104 13:00:51.132027 1 server_others.go:142] kube-proxy node IP is an IPv4 address (130.239.41.206), assume IPv4 operation
W0104 13:00:51.162536 1 server_others.go:578] Unknown proxy mode "", assuming iptables proxy
I0104 13:00:51.162615 1 server_others.go:185] Using iptables Proxier.
I0104 13:00:51.162797 1 server.go:650] Version: v1.20.1
I0104 13:00:51.163080 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0104 13:00:51.163289 1 config.go:315] Starting service config controller
I0104 13:00:51.163300 1 config.go:224] Starting endpoint slice config controller
I0104 13:00:51.163304 1 shared_informer.go:240] Waiting for caches to sync for service config
I0104 13:00:51.163311 1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0104 13:00:51.263469 1 shared_informer.go:247] Caches are synced for endpoint slice config
I0104 13:00:51.263487 1 shared_informer.go:247] Caches are synced for service config
> root@volatile:~# docker logs ae11c77897b0
root@volatile:~# ls /etc/cni/net.d/
10-calico.conflist calico-kubeconfig
root@volatile:~# ls /var/lib/calico/
root@volatile:~#
答案1
在主机上,volatile
您似乎已在 中配置了 cilium /etc/cni/net.d/*.conf
。它是一个网络插件,是 Kubernetes 的众多插件之一。其中一个文件可能包含以下内容:
{
"name": "cilium",
"type": "cilium-cni"
}
如果这是意外,请删除该文件。您似乎已经在运行 Project Calico 的竞争网络插件,这似乎已经足够了。因此,在命名空间中重新创建 pod calico-kube-controllers kube-system
,让它成功,然后重新创建其他 pod。
如果你打算在该主机上使用 Cilium,请返回Cilium 安装指南。如果您重新执行此操作,您可能会看到已为您创建了 /var/run/cilium/cilium.sock。