K8s 在所有使用的测试镜像上均无 DNS 解析

K8s 在所有使用的测试镜像上均无 DNS 解析

大家好,社区成员和 K8s 专家们,

我安装了一个基于虚拟机(Debian 10)的干净的 K8s 集群。安装并集成到我的环境中后,我在第一步中修复了 coreDNS解决。我做了进一步的测试,发现以下内容。测试设置包括 google.com nslookup 和 k8s DNS 地址上的本地 pod 查找。

基本设置:

  • K8s版本:1.19.0
  • K8s 设置:1 个主节点 + 2 个工作节点
  • 基于:Debian 10 VM
  • CNI:法兰绒

CoreDNS Pod 的状态

kube-system            coredns-xxxx 1/1     Running   1          26h
kube-system            coredns-yyyy 1/1     Running   1          26h

CoreDNS 日志:

.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7

CoreDNS配置:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: ""
  name: coredns
  namespace: kube-system
  resourceVersion: "219"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: xxx

CoreDNS 服务

kubectl -n kube-system get svc -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   15d   k8s-app=kube-dns

Kubelet 配置 yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

pods resolv.conf 的输出

/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search development.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5

主机 resolv.conf 的输出

cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 213.136.95.11
nameserver 213.136.95.10
search invalid

主机 /run/flannel/subnet.env 的输出

cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

测试设置

kubectl exec -i -t busybox -n development -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default

Busybox v1.28 图像

  • google.com nslookup作品回答需要很长时间
  • 本地 pod dns 地址失败回答需要很长时间

测试设置

kubectl exec -i -t dnsutils -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default

K8s dnsutils 测试镜像

  • google.com nslookup偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。
  • 本地 pod dns 地址偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。

测试设置

kubectl exec -i -t dnsutilsalpine -n development -- nslookup google.com
kubectl exec -i -t dnsutilsalpine -n development -- nslookup development.default

Alpine 图像 v3.12

  • google.com nslookup偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。
  • 本地 pod dns 地址失败

日志是空的。您知道问题出在哪里吗?

IP 路由主节点

default via X.X.X.X dev eth0 onlink 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
X.X.X.X via X.X.X.X dev eth0 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

更新

我重新安装了集群,现在我使用 Calico 作为 CNI,遇到了同样的问题。

更新2

经过在 Calico 下详细的错误分析,我发现是对应的 pod 无法正常工作。详细分析错误后,我发现是防火墙中对应的 179 端口没有被我打开。修复这个错误后,我可以确定 pod 的功能正常,并且确认现在名称的解析也是正常的。

答案1

无法通过评论发布那么多内容。以答案的形式发布。

我检查了导游您一直在参考并设置我自己的测试集群(GCP,3xDebian10 VM)。

不同之处在于,~/kube-cluster/master.yml我设置了不同的链接kube-flannel.yml(并且该文件的内容与指南中的文件不同:))

$ grep http master.yml 
      shell: kubectl apply -f  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml >> pod_network_setup.txt

在我的集群上:

$ kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
instance-1   Ready    master   2m48s   v1.19.0
instance-2   Ready    <none>   38s     v1.19.0
instance-3   Ready    <none>   38s     v1.19.0

kubectl get pods -o wide -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
coredns-f9fd979d6-8sxg7              1/1     Running   0          4m48s   10.244.0.2    instance-1   <none>           <none>
coredns-f9fd979d6-z5gdl              1/1     Running   0          4m48s   10.244.0.3    instance-1   <none>           <none>

kube-flannel-ds-4khll                1/1     Running   0          2m58s   10.156.0.21   instance-3   <none>           <none>
kube-flannel-ds-h8d9l                1/1     Running   0          2m58s   10.156.0.20   instance-2   <none>           <none>
kube-flannel-ds-zhzbf                1/1     Running   0          4m49s   10.156.0.19   instance-1   <none>           <none>

$ kubectl -n kube-system get svc -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   6m15s   k8s-app=kube-dns

sammy@instance-1:~$ ip route
default via 10.156.0.1 dev ens4 
10.156.0.1 dev ens4 scope link 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown


我没有发现 DNS 延迟问题。

kubectl create deployment busybox --image=nkolchenko/enea:server_go_latest
deployment.apps/busybox created

sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup default.default
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find default.default: NXDOMAIN

** server can't find default.default: NXDOMAIN

command terminated with exit code 1

real    0m0.227s
user    0m0.106s
sys     0m0.012s


sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup google.com
Server:         10.96.0.10
Address:        10.96.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.22.78

Non-authoritative answer:
Name:   google.com
Address: 2a00:1450:4001:820::200e


real    0m0.223s
user    0m0.102s
sys     0m0.012s

如果您需要我运行任何其他测试,请告诉我,我会在整个周末保留这个集群,然后将其拆除。

更新:

$ cat ololo 
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

$ kubectl create -f ololo 
pod/dnsutils created


$ kubectl get -A all  -o wide | grep dns
default       pod/dnsutils                             1/1     Running   0          63s     10.244.2.8    instance-2   <none>           <none>
kube-system   pod/coredns-cc8845745-jtvlh              1/1     Running   0          10m     10.244.1.3    instance-3   <none>           <none>
kube-system   pod/coredns-cc8845745-xxh28              1/1     Running   0          10m     10.244.0.4    instance-1   <none>           <none>
kube-system   pod/coredns-cc8845745-zlv84              1/1     Running   0          10m     10.244.2.6    instance-2   <none>           <none>

instance-1:~$ kubectl exec -i -t dnsutils -- time nslookup google.com
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   google.com
Address: 172.217.21.206
Name:   google.com
Address: 2a00:1450:4001:818::200e

real    0m 0.01s
user    0m 0.00s
sys     0m 0.00s




答案2

安装 Calico 并设置适当的防火墙规则(在所有节点上打开端口 179)后,我可以看到 coreDNS Pod 运行顺畅。因此,不同的镜像可以解析 DNS 地址,并且可以正确进行转发。

相关内容