Kubernetes kube-dns 无法解析

Kubernetes kube-dns 无法解析

我注意到,在尝试解决 Kubernetes 内部的本地和外部 DNS 查找时,延迟异常高。nslookup请求有时会得到解决,但平均需要 40 秒才能完成。

这些查找导致 Kafka 和 Mariadb 等系统崩溃:Timed out waiting for connection while in state: CONNECTING

以下是我成功和失败nslookupkubernetes.default

$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1851s   Server:    10.39.240.10
0.0004s   Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
35.2081s  command terminated with exit code 1
0.0075s   nslookup: can't resolve 'kubernetes.default'
0.0004s   

Total   35.4025s
$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1362s    Server:    10.39.240.10
0.1370s    Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
25.1734s                                                                              
40.2251s   Name:      kubernetes.defaultnetes.default
40.2288s   Address 1: 10.39.240.1 kubernetes.default.svc.cluster.local240.1 kubernetes.default.svc.cluster.local
40.2291s   

Total   40.2292s

我努力了调整KubeDNS 配置图,重建/向集群添加新节点,扩大可用的 kube-dns 实例数量,但均未成功。

只有在检查 kube-dns 日志时,我才发现一些有趣的东西。kubedns里面的容器kube-dns抛出了以下警告:

Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.

$ kubectl logs -n kube-system kube-dns-b46cc9485-qjm25 -c kubedns
I0603 22:23:48.889404       1 dns.go:48] version: 1.14.13
I0603 22:23:48.890627       1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0603 22:23:48.890762       1 server.go:121] FLAG: --alsologtostderr="false"
I0603 22:23:48.890774       1 server.go:121] FLAG: --config-dir="/kube-dns-config"
I0603 22:23:48.890794       1 server.go:121] FLAG: --config-map=""
I0603 22:23:48.890798       1 server.go:121] FLAG: --config-map-namespace="kube-system"
I0603 22:23:48.890802       1 server.go:121] FLAG: --config-period="10s"
I0603 22:23:48.890809       1 server.go:121] FLAG: --dns-bind-address="0.0.0.0"
I0603 22:23:48.890820       1 server.go:121] FLAG: --dns-port="10053"
I0603 22:23:48.890827       1 server.go:121] FLAG: --domain="cluster.local."
I0603 22:23:48.890839       1 server.go:121] FLAG: --federations=""
I0603 22:23:48.890851       1 server.go:121] FLAG: --healthz-port="8081"
I0603 22:23:48.890856       1 server.go:121] FLAG: --initial-sync-timeout="1m0s"
I0603 22:23:48.890865       1 server.go:121] FLAG: --kube-master-url=""
I0603 22:23:48.890875       1 server.go:121] FLAG: --kubecfg-file=""
I0603 22:23:48.890880       1 server.go:121] FLAG: --log-backtrace-at=":0"
I0603 22:23:48.890893       1 server.go:121] FLAG: --log-dir=""
I0603 22:23:48.890898       1 server.go:121] FLAG: --log-flush-frequency="5s"
I0603 22:23:48.890910       1 server.go:121] FLAG: --logtostderr="true"
I0603 22:23:48.890914       1 server.go:121] FLAG: --nameservers=""
I0603 22:23:48.890918       1 server.go:121] FLAG: --stderrthreshold="2"
I0603 22:23:48.890922       1 server.go:121] FLAG: --v="2"
I0603 22:23:48.890932       1 server.go:121] FLAG: --version="false"
I0603 22:23:48.890944       1 server.go:121] FLAG: --vmodule=""
I0603 22:23:48.891109       1 server.go:169] Starting SkyDNS server (0.0.0.0:10053)
I0603 22:23:48.891471       1 server.go:179] Skydns metrics enabled (/metrics:10055)
I0603 22:23:48.891514       1 dns.go:188] Starting endpointsController
I0603 22:23:48.891523       1 dns.go:191] Starting serviceController
I0603 22:23:48.891765       1 sync.go:167] Updated stubDomains to map[acme.local:[1.2.3.4]]
I0603 22:23:48.891794       1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0603 22:23:48.891836       1 dns.go:184] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[acme.local:[1.2.3.4]] UpstreamNameservers:[8.8.8.8 8.8.4.4]}
I0603 22:23:48.892030       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.892059       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.905592       1 dns.go:601] Could not find endpoints for service "broker" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905640       1 dns.go:601] Could not find endpoints for service "prometheus-alertmanager-operated" in namespace "metrics". DNS records will be created once endpoints show up.
I0603 22:23:48.905825       1 dns.go:601] Could not find endpoints for service "pzoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905981       1 dns.go:601] Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.
I0603 22:23:48.906386       1 dns.go:601] Could not find endpoints for service "zoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:49.392152       1 dns.go:222] Initialized services and endpoints from apiserver
I0603 22:23:49.392298       1 server.go:137] Setting up Healthz Handler (/readiness)
I0603 22:23:49.392319       1 server.go:142] Setting up cache handler (/cache)
I0603 22:23:49.392335       1 server.go:128] Status HTTP port 8081

我还注意到,在启用内部日志记录时,dnsmasq查询已执行,但没有给出答复。我附上了日志的一个小快照:

I0604 07:03:45.386985       1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.svc.cluster.local to 127.0.0.1
I0604 07:03:45.419486       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.419666       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421149       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421266       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421333       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421437       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.423253       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.cluster.local from 10.36.20.1
I0604 07:03:45.423375       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.cluster.local to 127.0.0.1
I0604 07:03:45.424088       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.424270       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.425873       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.426047       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.427186       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.google.internal from 10.36.20.1
I0604 07:03:45.427356       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.google.internal to 169.254.169.254
I0604 07:03:45.432498       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.432665       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.679157       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:45.679353       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:46.544879       1 nanny.go:116] dnsmasq[11]: query[PTR] 82.119.177.108.in-addr.arpa from 10.36.20.1
I0604 07:03:46.544929       1 nanny.go:116] dnsmasq[11]: forwarded 82.119.177.108.in-addr.arpa to 127.0.0.1
I0604 07:03:47.613025       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:47.613050       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:47.613191       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:47.613245       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:47.613477       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:47.613547       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:47.613823       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:47.613879       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:47.614112       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:47.614180       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:49.544862       1 nanny.go:116] dnsmasq[11]: query[PTR] 25.121.18.104.in-addr.arpa from 10.36.20.1
I0604 07:03:49.545061       1 nanny.go:116] dnsmasq[11]: forwarded 25.121.18.104.in-addr.arpa to 127.0.0.1
I0604 07:03:50.679458       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:50.679606       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:52.614897       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:52.614975       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:52.615283       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:52.615332       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:52.615569       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:52.615615       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:52.615870       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:52.615915       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:52.616158       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:52.616213       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:55.281882       1 nanny.go:116] dnsmasq[11]: query[A] prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local from 10.36.20.8
I0604 07:03:55.282681       1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local to 127.0.0.1
I0604 07:03:55.679869       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:55.679903       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:57.616880       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:57.616973       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:57.617280       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:57.617354       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:57.617600       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:57.617656       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:57.617898       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:57.617957       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:57.618192       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:57.618251       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:57.976779       1 nanny.go:116] dnsmasq[11]: query[PTR] 10.240.39.10.in-addr.arpa from 10.36.20.21
I0604 07:03:57.976823       1 nanny.go:116] dnsmasq[11]: forwarded 10.240.39.10.in-addr.arpa to 127.0.0.1
I0604 07:03:57.977075       1 nanny.go:116] dnsmasq[11]: reply 10.39.240.10 is kube-dns.kube-system.svc.cluster.local

答案1

我在使用 KubeDNS 时也遇到过类似的问题。没有找到根本原因,但通过安装 CoreDNS 成功解决了问题。

自 Kubernetes 1.11 以来核心DNS默认用作集群 DNS 服务器,因此我强烈建议从 KubeDNS 迁移到 CoreDNS。

如果你决定继续,请考虑这个文档 - https://kubernetes.io/docs/tasks/administer-cluster/coredns/#installing-kube-dns-instead-of-coredns-with-kubeadm

答案2

我通过将主池和节点池都升级到最新版本的 Kubernetes 解决了这个问题。不幸的是,我无法准确找出问题所在。

相关内容