我注意到,在尝试解决 Kubernetes 内部的本地和外部 DNS 查找时,延迟异常高。nslookup
请求有时会得到解决,但平均需要 40 秒才能完成。
这些查找导致 Kafka 和 Mariadb 等系统崩溃:Timed out waiting for connection while in state: CONNECTING
以下是我成功和失败nslookup
的kubernetes.default
$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1851s Server: 10.39.240.10
0.0004s Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
35.2081s command terminated with exit code 1
0.0075s nslookup: can't resolve 'kubernetes.default'
0.0004s
Total 35.4025s
$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1362s Server: 10.39.240.10
0.1370s Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
25.1734s
40.2251s Name: kubernetes.defaultnetes.default
40.2288s Address 1: 10.39.240.1 kubernetes.default.svc.cluster.local240.1 kubernetes.default.svc.cluster.local
40.2291s
Total 40.2292s
我努力了调整KubeDNS 配置图,重建/向集群添加新节点,扩大可用的 kube-dns 实例数量,但均未成功。
只有在检查 kube-dns 日志时,我才发现一些有趣的东西。kubedns
里面的容器kube-dns
抛出了以下警告:
Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.
$ kubectl logs -n kube-system kube-dns-b46cc9485-qjm25 -c kubedns
I0603 22:23:48.889404 1 dns.go:48] version: 1.14.13
I0603 22:23:48.890627 1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0603 22:23:48.890762 1 server.go:121] FLAG: --alsologtostderr="false"
I0603 22:23:48.890774 1 server.go:121] FLAG: --config-dir="/kube-dns-config"
I0603 22:23:48.890794 1 server.go:121] FLAG: --config-map=""
I0603 22:23:48.890798 1 server.go:121] FLAG: --config-map-namespace="kube-system"
I0603 22:23:48.890802 1 server.go:121] FLAG: --config-period="10s"
I0603 22:23:48.890809 1 server.go:121] FLAG: --dns-bind-address="0.0.0.0"
I0603 22:23:48.890820 1 server.go:121] FLAG: --dns-port="10053"
I0603 22:23:48.890827 1 server.go:121] FLAG: --domain="cluster.local."
I0603 22:23:48.890839 1 server.go:121] FLAG: --federations=""
I0603 22:23:48.890851 1 server.go:121] FLAG: --healthz-port="8081"
I0603 22:23:48.890856 1 server.go:121] FLAG: --initial-sync-timeout="1m0s"
I0603 22:23:48.890865 1 server.go:121] FLAG: --kube-master-url=""
I0603 22:23:48.890875 1 server.go:121] FLAG: --kubecfg-file=""
I0603 22:23:48.890880 1 server.go:121] FLAG: --log-backtrace-at=":0"
I0603 22:23:48.890893 1 server.go:121] FLAG: --log-dir=""
I0603 22:23:48.890898 1 server.go:121] FLAG: --log-flush-frequency="5s"
I0603 22:23:48.890910 1 server.go:121] FLAG: --logtostderr="true"
I0603 22:23:48.890914 1 server.go:121] FLAG: --nameservers=""
I0603 22:23:48.890918 1 server.go:121] FLAG: --stderrthreshold="2"
I0603 22:23:48.890922 1 server.go:121] FLAG: --v="2"
I0603 22:23:48.890932 1 server.go:121] FLAG: --version="false"
I0603 22:23:48.890944 1 server.go:121] FLAG: --vmodule=""
I0603 22:23:48.891109 1 server.go:169] Starting SkyDNS server (0.0.0.0:10053)
I0603 22:23:48.891471 1 server.go:179] Skydns metrics enabled (/metrics:10055)
I0603 22:23:48.891514 1 dns.go:188] Starting endpointsController
I0603 22:23:48.891523 1 dns.go:191] Starting serviceController
I0603 22:23:48.891765 1 sync.go:167] Updated stubDomains to map[acme.local:[1.2.3.4]]
I0603 22:23:48.891794 1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0603 22:23:48.891836 1 dns.go:184] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[acme.local:[1.2.3.4]] UpstreamNameservers:[8.8.8.8 8.8.4.4]}
I0603 22:23:48.892030 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.892059 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.905592 1 dns.go:601] Could not find endpoints for service "broker" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905640 1 dns.go:601] Could not find endpoints for service "prometheus-alertmanager-operated" in namespace "metrics". DNS records will be created once endpoints show up.
I0603 22:23:48.905825 1 dns.go:601] Could not find endpoints for service "pzoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905981 1 dns.go:601] Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.
I0603 22:23:48.906386 1 dns.go:601] Could not find endpoints for service "zoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:49.392152 1 dns.go:222] Initialized services and endpoints from apiserver
I0603 22:23:49.392298 1 server.go:137] Setting up Healthz Handler (/readiness)
I0603 22:23:49.392319 1 server.go:142] Setting up cache handler (/cache)
I0603 22:23:49.392335 1 server.go:128] Status HTTP port 8081
我还注意到,在启用内部日志记录时,dnsmasq
查询已执行,但没有给出答复。我附上了日志的一个小快照:
I0604 07:03:45.386985 1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.svc.cluster.local to 127.0.0.1
I0604 07:03:45.419486 1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.419666 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421149 1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421266 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421333 1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421437 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.423253 1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.cluster.local from 10.36.20.1
I0604 07:03:45.423375 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.cluster.local to 127.0.0.1
I0604 07:03:45.424088 1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.424270 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.425873 1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.426047 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.427186 1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.google.internal from 10.36.20.1
I0604 07:03:45.427356 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.google.internal to 169.254.169.254
I0604 07:03:45.432498 1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.432665 1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.679157 1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:45.679353 1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:46.544879 1 nanny.go:116] dnsmasq[11]: query[PTR] 82.119.177.108.in-addr.arpa from 10.36.20.1
I0604 07:03:46.544929 1 nanny.go:116] dnsmasq[11]: forwarded 82.119.177.108.in-addr.arpa to 127.0.0.1
I0604 07:03:47.613025 1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:47.613050 1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:47.613191 1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:47.613245 1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:47.613477 1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:47.613547 1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:47.613823 1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:47.613879 1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:47.614112 1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:47.614180 1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:49.544862 1 nanny.go:116] dnsmasq[11]: query[PTR] 25.121.18.104.in-addr.arpa from 10.36.20.1
I0604 07:03:49.545061 1 nanny.go:116] dnsmasq[11]: forwarded 25.121.18.104.in-addr.arpa to 127.0.0.1
I0604 07:03:50.679458 1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:50.679606 1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:52.614897 1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:52.614975 1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:52.615283 1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:52.615332 1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:52.615569 1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:52.615615 1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:52.615870 1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:52.615915 1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:52.616158 1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:52.616213 1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:55.281882 1 nanny.go:116] dnsmasq[11]: query[A] prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local from 10.36.20.8
I0604 07:03:55.282681 1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local to 127.0.0.1
I0604 07:03:55.679869 1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:55.679903 1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:57.616880 1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:57.616973 1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:57.617280 1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:57.617354 1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:57.617600 1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:57.617656 1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:57.617898 1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:57.617957 1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:57.618192 1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:57.618251 1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:57.976779 1 nanny.go:116] dnsmasq[11]: query[PTR] 10.240.39.10.in-addr.arpa from 10.36.20.21
I0604 07:03:57.976823 1 nanny.go:116] dnsmasq[11]: forwarded 10.240.39.10.in-addr.arpa to 127.0.0.1
I0604 07:03:57.977075 1 nanny.go:116] dnsmasq[11]: reply 10.39.240.10 is kube-dns.kube-system.svc.cluster.local
答案1
我在使用 KubeDNS 时也遇到过类似的问题。没有找到根本原因,但通过安装 CoreDNS 成功解决了问题。
自 Kubernetes 1.11 以来核心DNS默认用作集群 DNS 服务器,因此我强烈建议从 KubeDNS 迁移到 CoreDNS。
如果你决定继续,请考虑这个文档 - https://kubernetes.io/docs/tasks/administer-cluster/coredns/#installing-kube-dns-instead-of-coredns-with-kubeadm
答案2
我通过将主池和节点池都升级到最新版本的 Kubernetes 解决了这个问题。不幸的是,我无法准确找出问题所在。