升级到 v1.19.7 后kubeadm
,我的 pod 无法kube-dns
通过服务的 ClusterIP 请求服务。改用kube-dns
pod IP 地址时,DNS 解析可以正常工作。
kube-dns
pod 已启动并正在运行:
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-7674cdb774-2m58h 1/1 Running 0 33m
coredns-7674cdb774-x44b9 1/1 Running 0 33m
日志很清楚:
$ kubectl logs coredns-7674cdb774-2m58h -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 7442f38ca24670d4af368d447670ad91
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] 127.0.0.1:40705 - 31415 "HINFO IN 7224361654609676299.2243479664305694168. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.003954173s
kube-dns 服务暴露:
$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 301d
端点也配置了:
$ kubectl describe endpoints kube-dns --namespace=kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-01-19T14:23:13Z
Subsets:
Addresses: 10.44.0.1,10.47.0.2
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns-tcp 53 TCP
dns 53 UDP
metrics 9153 TCP
Events: <none>
这是我的 coredns ConfigMap:
$ kubectl describe cm -n kube-system coredns
Name: coredns
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
Corefile:
----
.:53 {
log
errors
ready
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
Events: <none>
在工作进程上,kube-proxy 正在运行:
$ kubectl get pods -n kube-system -o wide --field-selector spec.nodeName=ccqserv202
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-proxy-8r65s 1/1 Running 0 78m 10.158.37.202 ccqserv202 <none> <none>
weave-net-kvnzg 2/2 Running 0 6h3m 10.158.37.202 ccqserv202 <none> <none>
pod 之间的网络连接正常,因为我能够在不同节点上运行的 pod 之间进行通信(这里,dnsutils
在节点 上运行ccqserv202
,而10.44.0.1
是 的 pod IP 地址coredns-7674cdb774-x44b9
, 在节点 上运行ccqserv223
)。
$ kubectl exec -i -t dnsutils -- ping 10.44.0.1
PING 10.44.0.1 (10.44.0.1): 56 data bytes
64 bytes from 10.44.0.1: seq=0 ttl=64 time=2.101 ms
64 bytes from 10.44.0.1: seq=1 ttl=64 time=1.184 ms
64 bytes from 10.44.0.1: seq=2 ttl=64 time=1.107 ms
^C
--- 10.44.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.107/1.464/2.101 ms
我正在使用“ipvs”作为 kube-proxy 模式(尽管我可以确认在使用“iptables”或“用户空间”模式时会发生完全相同的行为)。
这是我的ipvsadm -Ln
节点ccqserv202
:
$ ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.46.128.0:30040 rr
TCP 10.96.0.1:443 rr
-> 10.158.37.223:6443 Masq 1 0 0
-> 10.158.37.224:6443 Masq 1 0 0
-> 10.158.37.225:6443 Masq 1 1 0
TCP 10.96.0.10:53 rr
TCP 10.96.0.10:9153 rr
TCP 10.97.147.126:2746 rr
TCP 10.100.162.140:9000 rr
TCP 10.101.126.110:5432 rr
TCP 10.109.184.125:4040 rr
TCP 10.110.163.112:9090 rr
TCP 10.110.215.252:8443 rr
TCP 10.158.37.202:30040 rr
TCP 127.0.0.1:30040 rr
TCP 134.158.237.2:30040 rr
UDP 10.96.0.10:53 rr
如您所见,虚拟地址下没有配置realserver ,但是(对应于API服务)10.96.0.10
下有配置。10.96.0.1
kubernetes
我可以打开到10.96.0.1
端口 443 的连接
$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.1 443
10.96.0.1 (10.96.0.1:443) open
我可以打开到10.44.0.1
端口 53 的连接
$ kubectl exec -i -t dnsutils -- nc -vz 10.44.0.1 53
10.44.0.1 (10.44.0.1:53) open
它甚至解决了!
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.44.0.1
Server: 10.44.0.1
Address: 10.44.0.1#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
kube-dns
但当我使用ClusterIP 10.96.0.10时,这不起作用
$ kubectl exec -i -t dnsutils -- nc -vz 10.96.0.10 53
command terminated with exit code 1
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
这是dnsutils
resolv.conf
文件:
$ kubectl exec -i -t dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local xxxxx.fr
nameserver 10.96.0.10
options ndots:5
最后,当我尝试手动将 realserver 添加到ipvs
节点时,
$ ipvsadm -a -u 10.96.0.10:53 -r 10.44.0.1:53 -m
kube-proxy
检测到并立即清除:
I0119 16:17:27.062890 1 proxier.go:2076] Using graceful delete to delete: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062906 1 graceful_termination.go:159] Trying to delete rs: 10.96.0.10:53/UDP/10.44.0.1:53
I0119 16:17:27.062974 1 graceful_termination.go:173] Deleting rs: 10.96.0.10:53/UDP/10.44.0.1:53
另外,我们可以看到,从到的tcpdump
DNS 请求没有被重写为或,正如它们应该的那样dnsutils
10.96.0.10
10.44.0.1
10.47.0.2
ipvs
10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.950950 IP (tos 0x0, ttl 64, id 45349, offset 0, flags [none], proto UDP (17), length 90)
10.46.128.8.53140 > 10.96.0.10.domain: [bad udp cksum 0x94f7 -> 0x12c4!] 4628+ A? kubernetes.default.default.svc.cluster.local. (62)
16:27:56.951321 IP (tos 0x0, ttl 64, id 59811, offset 0, flags [DF], proto UDP (17), length 70)
另一端 kube-dns pod 上的 tcpdump 显示这些请求从未到达。
我已经花了一整天的时间试图了解发生了什么以及如何解决,但现在我已经没有主意了。任何帮助都非常受欢迎。
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/不幸的是没有帮助。
谢谢你!
总结:使用服务 ClusterIP 时,Kubernetes 集群中的 DNS 解析不起作用kube-dns
,尽管使用 pod IP 地址时可以解析kube-dns
。我认为我的配置有问题kube-proxy
,但我找不到是什么问题。
答案1
我遇到过类似的现象。重复netcat dsn svc cluster ip port 53有时无法访问
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53
^[[Anc: connect to 10.96.0.10 port 53 (tcp) timed out: Operation now in progress
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53
Connection to 10.96.0.10 53 port [tcp/domain] succeeded!
root@cluster0:~# nc -vzw 3 10.96.0.10 53