安装带有外部 etcd 的 kubernetes - calico 问题

2024-6-1 • tag-icon

在安装带有外部 etcd 的 k8s 多主集群时，我遇到了多个问题。我之前在其他网站上安装过两次，都成功了，但这次我需要帮助。

calico 是根据指南 yaml 中的推荐进行安装的：https://docs.projectcalico.org/manifests/calico.yaml

首先，安装 calico 时出现问题 - 当配置中提到 apiServer.extraArgs.advertise-address 时，calico-node 无法访问 API。

此后 calico-kube-controllers 停留在 ContainerCreating 状态。我设法通过使用 calico-etcd.yaml 而不是 calico.yaml 来修复它。现在 calico pod 已启动并运行，calicoctl 可以在 etcd 中看到它们。

但 coredns pod 卡在了 ConteinerCreating 中。我可以在描述 pod：

  Warning  FailedScheduling        82s (x2 over 88s)  default-scheduler                  
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled               80s                default-scheduler                  
Successfully assigned kube-system/coredns-6955765f44-clbhk to master01.<removed>
  Warning  FailedCreatePodSandBox  18s                kubelet, 
master01.<removed>  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ab9fe3bd3d4e145c218fe59f6578169fa09075c59718fbe2f
7033d207c4ea4c" network for pod "coredns-6955765f44-clbhk": networkPlugin cni failed to set up pod "coredns-6955765f44-clbhk_kube-system" network: unable to connect to Cilium daemon: failed to create cilium agent client after 30.000000 seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config: dial unix /var/run/cilium/cilium.sock: connect: no such file or directory
Is the agent running?                                                                                                  
  Normal  SandboxChanged  17s  kubelet, master01.<removed>  Pod sandbox changed, it will be killed and re-created.

但我不用 cilium。我用的是 calico。我做过在第一次 calico 问题调试期间尝试了 cilium，但是我删除了它，多次重建集群，并且每次尝试后都擦除了 etcd 数据。

这是 kubelet 配置：

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: "v1.17.2"
controlPlaneEndpoint: "192.168.10.100:7443" #balancer ip:port


etcd:
  external:
    endpoints:
    - http://192.168.20.1:2379
    - http://192.168.20.2:2379
    - http://192.168.40.1:2379
    - http://192.168.40.2:2379
    - http://192.168.40.3:2379

#controllerManager:
#  extraArgs:
#    node-monitor-period: "2s"
#    node-monitor-grace-period: "16s"
#    pod-eviction-timeout: "30s"

networking:
  dnsDomain: "cluster.local"
  podSubnet: "10.96.0.0/12"
  serviceSubnet: "172.16.0.0/12"

apiServer:
  timeoutForControlPlane: "60s"
#  extraArgs:
#    advertise-address: "192.168.10.100"
#    bind-address: "192.168.20.1"
#    secure-port: "6443"

kubernetes 1.17.2，etcd 3.3.11，centos 7 x64

感觉问题出在 api pod 和 etcd 之间，但我无法找到它。

答案1

哦，没关系。我找到了。

/opt/cni/bin/ 中有 cilium-cni cilium-cni.old 文件这些文件显然是随 cilium 安装的，因此它们在 kubernetes-cni rpm 重新安装后仍然存在。我不知道为什么，但如果有 cilium，k8s 会更喜欢它。这是错误吗？我应该报告吗？

答案1

相关内容