调试 Kubernetes 中的 DNS 解析问题

调试 Kubernetes 中的 DNS 解析问题

我已经在 Ubuntu 18.04 上使用 Kubespray 构建了一个 Kubernetes 集群,并且面临 DNS 问题,因此容器基本上无法通过其主机名进行通信。

正在起作用的事情:

  • 容器通过IP地址进行通信
  • 互联网正在通过容器运行
  • 能够解决kubernetes.default

Kubernetes 主节点:

root@k8s-1:~# cat /etc/resolv.conf | grep -v ^\\#
nameserver 127.0.0.53
search home
root@k8s-1:~# 

荚:

root@k8s-1:~# kubectl exec dnsutils cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local home
options ndots:5
root@k8s-1:~# 

CoreDNS pod 是健康的:

root@k8s-1:~# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns        
NAME                       READY   STATUS    RESTARTS   AGE
coredns-58687784f9-8rmlw   1/1     Running   0          35m
coredns-58687784f9-hp8hp   1/1     Running   0          35m
root@k8s-1:~#

CoreDNS pod 的日志:

root@k8s-1:~# kubectl describe pods --namespace=kube-system -l k8s-app=kube-dns | tail -n 2
  Normal   Started           35m                 kubelet, k8s-2     Started container coredns
  Warning  DNSConfigForming  12s (x33 over 35m)  kubelet, k8s-2     Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 4.2.2.1 4.2.2.2 208.67.220.220

root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-8rmlw
.:53
2020-02-09T22:56:14.390Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
2020-02-09T22:56:14.391Z [INFO] CoreDNS-1.6.0
2020-02-09T22:56:14.391Z [INFO] linux/amd64, go1.12.7, 0a218d3
CoreDNS-1.6.0
linux/amd64, go1.12.7, 0a218d3
root@k8s-1:~#

root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-hp8hp
.:53
2020-02-09T22:56:20.388Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
2020-02-09T22:56:20.388Z [INFO] CoreDNS-1.6.0
2020-02-09T22:56:20.388Z [INFO] linux/amd64, go1.12.7, 0a218d3
CoreDNS-1.6.0
linux/amd64, go1.12.7, 0a218d3
root@k8s-1:~#

CoreDNS似乎暴露了:

root@k8s-1:~# kubectl get svc --namespace=kube-system | grep coredns
coredns                ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   37m
root@k8s-1:~#

root@k8s-1:~# kubectl get ep coredns --namespace=kube-system
NAME      ENDPOINTS                                                  AGE
coredns   10.233.64.2:53,10.233.65.3:53,10.233.64.2:53 + 3 more...   37m
root@k8s-1:~#

这些是我的问题 pod — 所有集群都因该问题而受到影响:

root@k8s-1:~# kubectl get pods -o wide -n default
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          17m   10.233.66.7   k8s-3   <none>           <none>
dnsutils                 1/1     Running   0          50m   10.233.66.5   k8s-3   <none>           <none>
nginx-86c57db685-p8zhc   1/1     Running   0          43m   10.233.64.3   k8s-1   <none>           <none>
nginx-86c57db685-st7rw   1/1     Running   0          47m   10.233.66.6   k8s-3   <none>           <none>
root@k8s-1:~# 

能够通过 IP 地址使用 DNS 和容器访问互联网:

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping 10.233.64.3"
PING 10.233.64.3 (10.233.64.3) 56(84) bytes of data.
64 bytes from 10.233.64.3: icmp_seq=1 ttl=62 time=0.481 ms
64 bytes from 10.233.64.3: icmp_seq=2 ttl=62 time=0.551 ms
...

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping google.com"
PING google.com (172.217.21.174) 56(84) bytes of data.
64 bytes from fra07s64-in-f174.1e100.net (172.217.21.174): icmp_seq=1 ttl=61 time=77.9 ms
...

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping kubernetes.default"
PING kubernetes.default.svc.cluster.local (10.233.0.1) 56(84) bytes of data.
64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=2 ttl=64 time=0.069 ms
...

实际问题:

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping nginx-86c57db685-p8zhc"
ping: nginx-86c57db685-p8zhc: Name or service not known
command terminated with exit code 2
root@k8s-1:~#

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping dnsutils"
ping: dnsutils: Name or service not known
command terminated with exit code 2
root@k8s-1:~#

oot@k8s-1:~# kubectl exec -ti busybox -- nslookup nginx-86c57db685-p8zhc
Server:     169.254.25.10
Address:    169.254.25.10:53

** server can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: NXDOMAIN

*** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.home: No answer
*** Can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.home: No answer

command terminated with exit code 1
root@k8s-1:~#

我是否遗漏了某些东西或如何修复使用主机名的容器之间的通信?

非常感谢

更新

更多检查:

root@k8s-1:~# kubectl exec -ti dnsutils -- nslookup kubernetes.default
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.233.0.1

我已经创建了 StatefulSet:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/web/web.yaml

我可以 ping 服务“nginx”:

root@k8s-1:~/kplay# k exec dnsutils -it nslookup nginx
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   nginx.default.svc.cluster.local
Address: 10.233.66.8
Name:   nginx.default.svc.cluster.local
Address: 10.233.64.3
Name:   nginx.default.svc.cluster.local
Address: 10.233.65.5
Name:   nginx.default.svc.cluster.local
Address: 10.233.66.6

使用 FQDN 时也可以联系 statefulset 成员

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0.nginx.default.svc.cluster.local
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   web-0.nginx.default.svc.cluster.local
Address: 10.233.65.5

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1.nginx.default.svc.cluster.local
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   web-1.nginx.default.svc.cluster.local
Address: 10.233.66.8

但不只是使用主机名:

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0
Server:     169.254.25.10
Address:    169.254.25.10#53

** server can't find web-0: NXDOMAIN

command terminated with exit code 1
root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1
Server:     169.254.25.10
Address:    169.254.25.10#53

** server can't find web-1: NXDOMAIN

command terminated with exit code 1
root@k8s-1:~/kplay#

它们都位于同一个命名空间中:

root@k8s-1:~/kplay# k get pods -n default
NAME                     READY   STATUS    RESTARTS   AGE
busybox                  1/1     Running   22         22h
dnsutils                 1/1     Running   22         22h
nginx-86c57db685-p8zhc   1/1     Running   0          22h
nginx-86c57db685-st7rw   1/1     Running   0          22h
web-0                    1/1     Running   0          11m
web-1                    1/1     Running   0          10m

另一项测试确认我能够 ping 服务:

kubectl create deployment --image nginx some-nginx
kubectl scale deployment --replicas 2 some-nginx
kubectl expose deployment some-nginx --port=12345 --type=NodePort

root@k8s-1:~/kplay# k exec dnsutils -it nslookup some-nginx
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   some-nginx.default.svc.cluster.local
Address: 10.233.63.137

最后的想法

有趣的事实,但也许这就是 Kubernetes 的工作方式?如果想单独访问某个 pod,我可以访问服务主机名和 statefulset 成员。如果不是 statefulset,那么访问单个 pod 似乎并不重要,至少在我的 k8s 使用中是这样(对每个人来说都是如此)。

答案1


我建议你关注因此我们可以隔离您 CoreDNS 中可能存在的问题,并且正如您所见,它运行良好。

如果不是 statefulset,那么到达单个 pod 似乎并不重要,至少在我的 k8s 使用中是这样(对每个人来说都是如此)。

可以使用 DNS 记录访问 pod,但正如您所说,这在常规 K8s 实现中并不重要。

启用后,Pod 将被分配一个 DNS A 记录,形式为 pod-ip-address.my-namespace.pod.cluster.local

例如, 1.2.3.4 命名空间中 IP 为default 且 DNS 名称为 的 Podcluster.local 将具有以下条目: 1-2-3-4.default.pod.cluster.local来源

例子

$ kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP          NODE                                 NOMINATED NODE   READINESS GATES
dnsutils     1/1     Running   20         20h     10.28.2.3   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>
sample-pod   1/1     Running   0          2m11s   10.28.2.4   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>

$ kubectl exec -ti dnsutils -- nslookup 10-28-2-4.default.pod.cluster.local
Server:     10.31.240.10
Address:    10.31.240.10#53

Name:   10-28-2-4.default.pod.cluster.local
Address: 10.28.2.4

这是一个有趣的事实,但也许这就是 Kubernetes 应该工作的方式?

是的,您的 CoreDNS 工作正常,您描述的一切都符合预期。

相关内容