调查由此开始:CoreDNS 无法工作超过几秒钟,出现以下错误:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-controller-8xcl9 1/1 Running 0 11h
ingress-nginx ingress-nginx-controller-hwhvk 1/1 Running 0 11h
ingress-nginx ingress-nginx-controller-xqdqx 1/1 Running 2 (10h ago) 11h
kube-system calico-kube-controllers-684bcfdc59-cr7hr 1/1 Running 0 11h
kube-system calico-node-62p58 1/1 Running 2 (10h ago) 11h
kube-system calico-node-btvdh 1/1 Running 0 11h
kube-system calico-node-q5bkr 1/1 Running 0 11h
kube-system coredns-8474476ff8-dnt6b 0/1 CrashLoopBackOff 1 (3s ago) 5s
kube-system coredns-8474476ff8-ftcbx 0/1 Error 1 (2s ago) 5s
kube-system dns-autoscaler-5ffdc7f89d-4tshm 1/1 Running 2 (10h ago) 11h
kube-system kube-apiserver-hyzio 1/1 Running 4 (10h ago) 11h
kube-system kube-controller-manager-hyzio 1/1 Running 4 (10h ago) 11h
kube-system kube-proxy-2d8ls 1/1 Running 0 11h
kube-system kube-proxy-c6c4l 1/1 Running 4 (10h ago) 11h
kube-system kube-proxy-nzqdd 1/1 Running 0 11h
kube-system kube-scheduler-hyzio 1/1 Running 5 (10h ago) 11h
kube-system kubernetes-dashboard-548847967d-66dwz 1/1 Running 0 11h
kube-system kubernetes-metrics-scraper-6d49f96c97-r6dz2 1/1 Running 0 11h
kube-system nginx-proxy-dyzio 1/1 Running 0 11h
kube-system nginx-proxy-zyzio 1/1 Running 0 11h
kube-system nodelocaldns-g9wxh 1/1 Running 0 11h
kube-system nodelocaldns-j2qc9 1/1 Running 4 (10h ago) 11h
kube-system nodelocaldns-vk84j 1/1 Running 0 11h
kube-system registry-j5prk 1/1 Running 0 11h
kube-system registry-proxy-5wbhq 1/1 Running 0 11h
kube-system registry-proxy-77lqd 1/1 Running 0 11h
kube-system registry-proxy-s45p4 1/1 Running 2 (10h ago) 11h
kubectl describe
那个吊舱没有给图片带来太多变化:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio
Normal Pulled 25s (x4 over 68s) kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
Normal Created 25s (x4 over 68s) kubelet Created container coredns
Normal Started 25s (x4 over 68s) kubelet Started container coredns
Warning BackOff 6s (x11 over 66s) kubelet Back-off restarting failed container
但查看日志确实如此:
$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."
链接了故障排除文档真是太好了!我开始浏览该页面并发现我的/etc/resolv.conf
本地 IP 确实有问题nameserver 127.0.0.53
。
此外,我还发现真实的DNS IPs /run/systemd/resolve/resolv.conf
,但现在的问题是:如何执行故障排除文档中描述的操作,说:
将以下内容添加到您的 kubelet 配置 yaml:resolvConf:(或通过命令行标志 --resolv-conf(在 1.10 中已弃用))。您的“真实”resolv.conf 包含上游服务器的实际 IP,没有本地/环回地址。此标志告诉 kubelet 将备用 resolv.conf 传递给 Pod。对于使用 systemd-resolved 的系统,“真实”resolv.conf 的位置通常是 /run/systemd/resolve/resolv.conf,但这可能会因您的发行版而异。
因此,问题是:
- 如何查找或在哪里创建提到的 kubelet 配置 yaml,
- 我应该在什么级别指定
resolvConf
值,以及 - 它可以接受多个值吗?我定义了两个名称服务器。它们应该作为单独的条目还是数组给出?
答案1
/etc/resolv.conf/
位于每个节点中。您可以通过SSH
进入节点来编辑它。
然后您必须重新启动kubelet
以使更改生效。
sudo systemctl restart kubelet
(如果这不起作用,请使用 重新启动节点sudo reboot
)
/home/kubernetes/kubelet-config.yaml
(也位于每个节点上)文件包含 kubelet 的配置。您可以创建新文件,并使用字段resolv.conf
指向它resolvConf
apiVersion: kubelet.config.k8s.io/v1beta1
...
kind: KubeletConfiguration
...
resolvConf: <location of the file>
重要的:新配置将仅应用于更新后创建的 Pod。强烈建议在更改配置之前清空节点。
它可以接受多个值吗?我定义了两个名称服务器。它们应该作为单独的条目还是数组给出?
Kubelet 配置文档状态resolvConf
属于类型细绳,因此可能只接受单一值。