为了在 centos7 上使用 containerd 和 calico 安装主 kubernetes 节点:
我遵循以下步骤:https://computingforgeeks.com/install-kubernetes-cluster-on-centos-with-kubeadm/
在 kubeadm init --pod-network-cidr=192.168.0.0/16 --upload-certs 之后
我使用以下命令安装了 calico:
- kubectl 创建 -fhttps://docs.projectcalico.org/manifests/tigera-operator.yaml
- kubectl 创建 -fhttps://docs.projectcalico.org/manifests/custom-resources.yaml
(代理不允许我以此方式运行,因此我首先下载了文件,然后他们在文件上进行了创建)
然后安装成功了,但是 :coredns 和 calico-kube-controllers 卡在了容器创建
此安装使用公司 DNS 和代理,我已经卡在这里好几天了,无法找出为什么 coredns 卡在 ContainerCreating 中
[root@master-node system]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system calico-kube-controllers-68884f975d-6qm5l 0/1 Terminating 0 16d
calico-system calico-kube-controllers-68884f975d-ckr2g 0/1 ContainerCreating 0 154m
calico-system calico-node-5n4nj 0/1 Running 7 (165m ago) 16d
calico-system calico-node-gp6d5 0/1 Running 1 (15d ago) 16d
calico-system calico-typha-77b6fb6f86-zc8jn 1/1 Running 7 (165m ago) 16d
kube-system coredns-6d4b75cb6d-2tqk9 0/1 ContainerCreating 0 4h46m
kube-system coredns-6d4b75cb6d-9dn5d 0/1 ContainerCreating 0 6h58m
kube-system coredns-6d4b75cb6d-vfchn 0/1 Terminating 32 15d
kube-system etcd-master-node 1/1 Running 14 (165m ago) 16d
kube-system kube-apiserver-master-node 1/1 Running 8 (165m ago) 16d
kube-system kube-controller-manager-master-node 1/1 Running 7 (165m ago) 16d
kube-system kube-proxy-c6l9s 1/1 Running 7 (165m ago) 16d
kube-system kube-proxy-pqrf8 1/1 Running 1 (15d ago) 16d
kube-system kube-scheduler-master-node 1/1 Running 8 (165m ago) 16d
tigera-operator tigera-operator-5fb55776df-955dj 1/1 Running 13 (164m ago) 16d
kubectl 描述 pod coredns
[root@master-node system]# kubectl describe pod coredns-6d4b75cb6d-2tqk9 -n kube-system
Name: coredns-6d4b75cb6d-2tqk9
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: master-node/10.32.67.20
Start Time: Wed, 08 Jun 2022 11:59:59 +0200
Labels: k8s-app=kube-dns
pod-template-hash=6d4b75cb6d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-6d4b75cb6d
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns/coredns:v1.8.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ch9xq (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-ch9xq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 114s (x65 over 143m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de60ae0a286ad648a9691065e68fe03589b18a26adfafff0c089d5774b46c163": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
kubectl 获取事件 --all-namespaces --sort-by='.metadata.creationTimestamp'
[root@master-node system]# kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
calico-system 5m52s Warning Unhealthy pod/calico-node-gp6d5 (combined from similar events): Readiness probe failed: 2022-06-08 14:50:45.231 [INFO][30872] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system 4m16s Warning FailedKillPod pod/calico-kube-controllers-68884f975d-6qm5l error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
kube-system 87s Warning FailedCreatePodSandBox pod/coredns-6d4b75cb6d-9dn5d (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "acd785aa916d2c97aa16ceeaa2f04e7967a1224cb437e50770f32a02b5a9ed3f": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
calico-system 13m Warning FailedKillPod pod/calico-kube-controllers-68884f975d-6qm5l error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"
kube-system 4m6s Warning FailedKillPod pod/coredns-6d4b75cb6d-vfchn error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
calico-system 6s Warning Unhealthy pod/calico-node-5n4nj (combined from similar events): Readiness probe failed: 2022-06-08 14:56:31.871 [INFO][17966] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system 45m Warning DNSConfigForming pod/calico-kube-controllers-68884f975d-ckr2g Search Line limits were exceeded, some search paths have been omitted, the applied search line is: calico-system.svc.cluster.local svc.cluster.local cluster.local XXXXXX.com cs.XXXXX.com fr.XXXXXX.com
kube-system 2m49s Warning FailedCreatePodSandBox pod/coredns-6d4b75cb6d-2tqk9 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "529139e14dbb8c5917c72428600c5a8333aa21bf249face90048d1b344da5d9a": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
calico-system 3m42s Warning FailedCreatePodSandBox pod/calico-kube-controllers-68884f975d-ckr2g (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "45dd6ebfb53fd745b1ca41853bb7744e407b3439111a946b007752eb8f8f7abd": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
kube-system 9m6s Warning FailedKillPod pod/coredns-6d4b75cb6d-vfchn error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"
calico-node 日志为:
(resync-filter-v4,resync-raw-v4) 2022-06-08 18:26:42.665 [INFO][69] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m2.6s: avg=3ms longest=6ms (resync-nat-v4) 2022-06-08 18:27:46.076 [INFO][69] felix/summary.go 100: Summarising 7 dataplane reconciliation loops over 1m3.4s: avg=2ms longest=4ms (resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,resync-wg)
印花香蒲:
2022-06-08 17:34:49.625 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/assignment/" error=too old resource version: 190422 (3180569) 2022-06-08 17:34:50.121 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/" 2022-06-08 18:10:27.377 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 190388 (3180569) 2022-06-08 18:10:27.874 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/"
答案1
我通过这种方式解决了这个问题:
通过在这两个文件的 no_proxy 中添加这些 IP 范围:
- 10.96.0.0/24(Kubernetes API)
- 192.168.0.0/16(CIDR 校准)
- 10.xx0(集群节点)
在 :/etc/环境:
HTTP_PROXY=http://myproxy-XXXXXXXX.com:8080
HTTPS_PROXY=http://myproxy-XXXXXXXX.com:8080
NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27
http_proxy=http://myproxy-XXXXXXXX.com:8080
https_proxy=http://myproxy-XXXXXXXX.com:8080
no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27
然后 :
source environement
在 :/etc/systemd/system/containerd.service.d/http_proxy.conf
[Service]
Environment="HTTP_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="HTTPS_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"
Environment="http_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="https_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"
然后 :
systemctl daemon-reload
systemctl restart containerd
我还对该文件进行了如下编辑:
kubectl -n kube-system edit cm coredns
- 抑制:max_concurrent 1000\n
- 将“代理”替换为“转发”
然后我在 Pod 上错误地“kubectl delete”然后所有 Pod 都运行正常
希望能帮助到你