在安装带有外部 etcd 的 k8s 多主集群时,我遇到了多个问题。我之前在其他网站上安装过两次,都成功了,但这次我需要帮助。
calico 是根据指南 yaml 中的推荐进行安装的:https://docs.projectcalico.org/manifests/calico.yaml
首先,安装 calico 时出现问题 - 当配置中提到 apiServer.extraArgs.advertise-address 时,calico-node 无法访问 API。
此后 calico-kube-controllers 停留在 ContainerCreating 状态。我设法通过使用 calico-etcd.yaml 而不是 calico.yaml 来修复它。现在 calico pod 已启动并运行,calicoctl 可以在 etcd 中看到它们。
但 coredns pod 卡在了 ConteinerCreating 中。我可以在描述 pod:
Warning FailedScheduling 82s (x2 over 88s) default-scheduler
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 80s default-scheduler
Successfully assigned kube-system/coredns-6955765f44-clbhk to master01.<removed>
Warning FailedCreatePodSandBox 18s kubelet,
master01.<removed> Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ab9fe3bd3d4e145c218fe59f6578169fa09075c59718fbe2f
7033d207c4ea4c" network for pod "coredns-6955765f44-clbhk": networkPlugin cni failed to set up pod "coredns-6955765f44-clbhk_kube-system" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config: dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?
Normal SandboxChanged 17s kubelet, master01.<removed> Pod sandbox changed, it will be killed and re-created.
但我不用 cilium。我用的是 calico。我做过在第一次 calico 问题调试期间尝试了 cilium,但是我删除了它,多次重建集群,并且每次尝试后都擦除了 etcd 数据。
这是 kubelet 配置:
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: "v1.17.2"
controlPlaneEndpoint: "192.168.10.100:7443" #balancer ip:port
etcd:
external:
endpoints:
- http://192.168.20.1:2379
- http://192.168.20.2:2379
- http://192.168.40.1:2379
- http://192.168.40.2:2379
- http://192.168.40.3:2379
#controllerManager:
# extraArgs:
# node-monitor-period: "2s"
# node-monitor-grace-period: "16s"
# pod-eviction-timeout: "30s"
networking:
dnsDomain: "cluster.local"
podSubnet: "10.96.0.0/12"
serviceSubnet: "172.16.0.0/12"
apiServer:
timeoutForControlPlane: "60s"
# extraArgs:
# advertise-address: "192.168.10.100"
# bind-address: "192.168.20.1"
# secure-port: "6443"
kubernetes 1.17.2,etcd 3.3.11,centos 7 x64
感觉问题出在 api pod 和 etcd 之间,但我无法找到它。
答案1
哦,没关系。我找到了。
/opt/cni/bin/ 中有 cilium-cni cilium-cni.old 文件 这些文件显然是随 cilium 安装的,因此它们在 kubernetes-cni rpm 重新安装后仍然存在。我不知道为什么,但如果有 cilium,k8s 会更喜欢它。这是错误吗?我应该报告吗?