Kubernetes:无法通过 ClusterIP 访问 kube-dns 服务

Kubernetes:无法通过 ClusterIP 访问 kube-dns 服务

升级到 v1.19.7 后kubeadm,我的 pod 无法kube-dns通过服务的 ClusterIP 请求服务。改用kube-dnspod IP 地址时,DNS 解析可以正常工作。

kube-dnspod 已启动并正在运行:

$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE
coredns-7674cdb774-2m58h   1/1     Running   0          33m
coredns-7674cdb774-x44b9   1/1     Running   0          33m

日志很清楚:

$ kubectl logs coredns-7674cdb774-2m58h -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 7442f38ca24670d4af368d447670ad91
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] 127.0.0.1:40705 - 31415 "HINFO IN 7224361654609676299.2243479664305694168. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.003954173s

kube-dns 服务暴露:

$ kubectl get svc  -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   301d

端点也配置了:

$ kubectl describe endpoints kube-dns --namespace=kube-system
Name:         kube-dns
Namespace:    kube-system
Labels:       k8s-app=kube-dns
              kubernetes.io/cluster-service=true
              kubernetes.io/name=KubeDNS
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-01-19T14:23:13Z
Subsets:
  Addresses:          10.44.0.1,10.47.0.2
  NotReadyAddresses:  <none>
  Ports:
    Name     Port  Protocol
    ----     ----  --------
    dns-tcp  53    TCP
    dns      53    UDP
    metrics  9153  TCP

Events:  <none>

这是我的 coredns ConfigMap:

$ kubectl describe cm -n kube-system coredns
Name:         coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
Corefile:
----
.:53 {
    log
    errors
    ready
    health
    kubernetes cluster.local in-addr.arpa ip6.arpa {
       pods insecure
       fallthrough in-addr.arpa ip6.arpa
       ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Events:  <none>

在工作进程上,kube-proxy 正在运行:

$ kubectl get pods -n kube-system -o wide --field-selector spec.nodeName=ccqserv202
NAME               READY   STATUS    RESTARTS   AGE    IP              NODE         NOMINATED NODE   READINESS GATES
kube-proxy-8r65s   1/1     Running   0          78m    10.158.37.202   ccqserv202   <none>           <none>
weave-net-kvnzg    2/2     Running   0          6h3m   10.158.37.202   ccqserv202   <none>           <none>

pod 之间的网络连接正常,因为我能够在不同节点上运行的 pod 之间进行通信(这里,dnsutils在节点 上运行ccqserv202,而10.44.0.1是 的 pod IP 地址coredns-7674cdb774-x44b9, 在节点 上运行ccqserv223)。

$ kubectl exec -i -t dnsutils -- ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: seq=0 ttl=64 time=2.101 ms
64 bytes from 10.44.0.1: seq=1 ttl=64 time=1.184 ms
64 bytes from 10.44.0.1: seq=2 ttl=64 time=1.107 ms
^C
--- 10.44.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.107/1.464/2.101 ms

我正在使用“ipvs”作为 kube-proxy 模式(尽管我可以确认在使用“iptables”或“用户空间”模式时会发生完全相同的行为)。

这是我的ipvsadm -Ln节点ccqserv202

$ ipvsadm -Ln 
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.46.128.0:30040 rr
TCP  10.96.0.1:443 rr
  -> 10.158.37.223:6443           Masq    1      0          0         
  -> 10.158.37.224:6443           Masq    1      0          0         
  -> 10.158.37.225:6443           Masq    1      1          0         
TCP  10.96.0.10:53 rr
TCP  10.96.0.10:9153 rr
TCP  10.97.147.126:2746 rr
TCP  10.100.162.140:9000 rr
TCP  10.101.126.110:5432 rr
TCP  10.109.184.125:4040 rr
TCP  10.110.163.112:9090 rr
TCP  10.110.215.252:8443 rr
TCP  10.158.37.202:30040 rr
TCP  127.0.0.1:30040 rr
TCP  134.158.237.2:30040 rr
UDP  10.96.0.10:53 rr

如您所见,虚拟地址下没有配置realserver ,但是(对应于API服务)10.96.0.10下有配置。10.96.0.1kubernetes

我可以打开到10.96.0.1端口 443 的连接

$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.1 443
10.96.0.1 (10.96.0.1:443) open

我可以打开到10.44.0.1端口 53 的连接

$ kubectl exec -i -t dnsutils -- nc -vz 10.44.0.1 53
10.44.0.1 (10.44.0.1:53) open

它甚至解决了!

$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.44.0.1
Server:     10.44.0.1
Address:    10.44.0.1#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

kube-dns但当我使用ClusterIP 10.96.0.10时,这不起作用

$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.10 53
command terminated with exit code 1
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached

这是dnsutils resolv.conf文件:

$ kubectl exec -i -t dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local xxxxx.fr
nameserver 10.96.0.10
options ndots:5

最后,当我尝试手动将 realserver 添加到ipvs节点时,

$ ipvsadm -a -u 10.96.0.10:53 -r 10.44.0.1:53 -m

kube-proxy检测到并立即清除:

I0119 16:17:27.062890       1 proxier.go:2076] Using graceful delete to delete: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062906       1 graceful_termination.go:159] Trying to delete rs: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062974       1 graceful_termination.go:173] Deleting rs: 10.96.0.10:53/UDP/10.44.0.1:53

另外,我们可以看到,从到的tcpdumpDNS 请求没有被重写为或,正如它们应该的那样dnsutils10.96.0.1010.44.0.110.47.0.2ipvs

    10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.950950 IP (tos 0x0, ttl 64, id 45349, offset 0, flags [none], proto UDP (17), length 90)
    10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.951321 IP (tos 0x0, ttl 64, id 59811, offset 0, flags [DF], proto UDP (17), length 70)

另一端 kube-dns pod 上的 tcpdump 显示这些请求从未到达。

我已经花了一整天的时间试图了解发生了什么以及如何解决,但现在我已经没有主意了。任何帮助都非常受欢迎。

https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/不幸的是没有帮助。

谢谢你!

总结:使用服务 ClusterIP 时,Kubernetes 集群中的 DNS 解析不起作用kube-dns,尽管使用 pod IP 地址时可以解析kube-dns。我认为我的配置有问题kube-proxy,但我找不到是什么问题。

答案1

我遇到过类似的现象。重复netcat dsn svc cluster ip port 53有时无法访问

root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53
^[[Anc: connect to 10.96.0.10 port 53 (tcp) timed out: Operation now in progress
root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3  10.96.0.10 53

相关内容