CoreDNS 因循环而失败:如何为 kubelet 提供正确的 resolvConf?

CoreDNS 因循环而失败:如何为 kubelet 提供正确的 resolvConf?

调查由此开始:CoreDNS 无法工作超过几秒钟,出现以下错误:

$ kubectl get pods --all-namespaces
NAMESPACE       NAME                                          READY   STATUS             RESTARTS      AGE
ingress-nginx   ingress-nginx-controller-8xcl9                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-hwhvk                1/1     Running            0             11h
ingress-nginx   ingress-nginx-controller-xqdqx                1/1     Running            2 (10h ago)   11h
kube-system     calico-kube-controllers-684bcfdc59-cr7hr      1/1     Running            0             11h
kube-system     calico-node-62p58                             1/1     Running            2 (10h ago)   11h
kube-system     calico-node-btvdh                             1/1     Running            0             11h
kube-system     calico-node-q5bkr                             1/1     Running            0             11h
kube-system     coredns-8474476ff8-dnt6b                      0/1     CrashLoopBackOff   1 (3s ago)    5s
kube-system     coredns-8474476ff8-ftcbx                      0/1     Error              1 (2s ago)    5s
kube-system     dns-autoscaler-5ffdc7f89d-4tshm               1/1     Running            2 (10h ago)   11h
kube-system     kube-apiserver-hyzio                          1/1     Running            4 (10h ago)   11h
kube-system     kube-controller-manager-hyzio                 1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-2d8ls                              1/1     Running            0             11h
kube-system     kube-proxy-c6c4l                              1/1     Running            4 (10h ago)   11h
kube-system     kube-proxy-nzqdd                              1/1     Running            0             11h
kube-system     kube-scheduler-hyzio                          1/1     Running            5 (10h ago)   11h
kube-system     kubernetes-dashboard-548847967d-66dwz         1/1     Running            0             11h
kube-system     kubernetes-metrics-scraper-6d49f96c97-r6dz2   1/1     Running            0             11h
kube-system     nginx-proxy-dyzio                             1/1     Running            0             11h
kube-system     nginx-proxy-zyzio                             1/1     Running            0             11h
kube-system     nodelocaldns-g9wxh                            1/1     Running            0             11h
kube-system     nodelocaldns-j2qc9                            1/1     Running            4 (10h ago)   11h
kube-system     nodelocaldns-vk84j                            1/1     Running            0             11h
kube-system     registry-j5prk                                1/1     Running            0             11h
kube-system     registry-proxy-5wbhq                          1/1     Running            0             11h
kube-system     registry-proxy-77lqd                          1/1     Running            0             11h
kube-system     registry-proxy-s45p4                          1/1     Running            2 (10h ago)   11h

kubectl describe那个吊舱没有给图片带来太多变化:

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  67s                default-scheduler  Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio
  Normal   Pulled     25s (x4 over 68s)  kubelet            Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
  Normal   Created    25s (x4 over 68s)  kubelet            Created container coredns
  Normal   Started    25s (x4 over 68s)  kubelet            Started container coredns
  Warning  BackOff    6s (x11 over 66s)  kubelet            Back-off restarting failed container

但查看日志确实如此:

$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."

链接了故障排除文档真是太好了!我开始浏览该页面并发现我的/etc/resolv.conf本地 IP 确实有问题nameserver 127.0.0.53

此外,我还发现真实的DNS IPs /run/systemd/resolve/resolv.conf,但现在的问题是:如何执行故障排除文档中描述的操作,说:

将以下内容添加到您的 kubelet 配置 yaml:resolvConf:(或通过命令行标志 --resolv-conf(在 1.10 中已弃用))。您的“真实”resolv.conf 包含上游服务器的实际 IP,没有本地/环回地址。此标志告诉 kubelet 将备用 resolv.conf 传递给 Pod。对于使用 systemd-resolved 的系统,“真实”resolv.conf 的位置通常是 /run/systemd/resolve/resolv.conf,但这可能会因您的发行版而异。

因此,问题是:

  • 如何查找或在哪里创建提到的 kubelet 配置 yaml,
  • 我应该在什么级别指定resolvConf值,以及
  • 它可以接受多个值吗?我定义了两个名称服务器。它们应该作为单独的条目还是数组给出?

答案1

/etc/resolv.conf/位于每个节点中。您可以通过SSH进入节点来编辑它。
然后您必须重新启动kubelet以使更改生效。

sudo systemctl restart kubelet

(如果这不起作用,请使用 重新启动节点sudo reboot


/home/kubernetes/kubelet-config.yaml(也位于每个节点上)文件包含 kubelet 的配置。您可以创建新文件,并使用字段resolv.conf指向它resolvConf

apiVersion: kubelet.config.k8s.io/v1beta1
...
kind: KubeletConfiguration
...
resolvConf: <location of the file>

重要的:新配置将仅应用于更新后创建的 Pod。强烈建议在更改配置之前清空节点。


它可以接受多个值吗?我定义了两个名称服务器。它们应该作为单独的条目还是数组给出?

Kubelet 配置文档状态resolvConf属于类型细绳,因此可能只接受单一值。

相关内容