大家好,社区成员和 K8s 专家们,
我安装了一个基于虚拟机(Debian 10)的干净的 K8s 集群。安装并集成到我的环境中后,我在第一步中修复了 coreDNS解决。我做了进一步的测试,发现以下内容。测试设置包括 google.com nslookup 和 k8s DNS 地址上的本地 pod 查找。
基本设置:
- K8s版本:1.19.0
- K8s 设置:1 个主节点 + 2 个工作节点
- 基于:Debian 10 VM
- CNI:法兰绒
CoreDNS Pod 的状态
kube-system coredns-xxxx 1/1 Running 1 26h
kube-system coredns-yyyy 1/1 Running 1 26h
CoreDNS 日志:
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
CoreDNS配置:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: ""
name: coredns
namespace: kube-system
resourceVersion: "219"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: xxx
CoreDNS 服务
kubectl -n kube-system get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 15d k8s-app=kube-dns
Kubelet 配置 yaml
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
pods resolv.conf 的输出
/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search development.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5
主机 resolv.conf 的输出
cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 213.136.95.11
nameserver 213.136.95.10
search invalid
主机 /run/flannel/subnet.env 的输出
cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
测试设置
kubectl exec -i -t busybox -n development -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default
Busybox v1.28 图像
- google.com nslookup作品回答需要很长时间
- 本地 pod dns 地址失败回答需要很长时间
测试设置
kubectl exec -i -t dnsutils -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default
K8s dnsutils 测试镜像
- google.com nslookup偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。
- 本地 pod dns 地址偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。
测试设置
kubectl exec -i -t dnsutilsalpine -n development -- nslookup google.com
kubectl exec -i -t dnsutilsalpine -n development -- nslookup development.default
Alpine 图像 v3.12
- google.com nslookup偶尔工作感觉有时候地址是从缓存中提取的,有时候却不起作用。
- 本地 pod dns 地址失败
日志是空的。您知道问题出在哪里吗?
IP 路由主节点
default via X.X.X.X dev eth0 onlink
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
X.X.X.X via X.X.X.X dev eth0
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
更新
我重新安装了集群,现在我使用 Calico 作为 CNI,遇到了同样的问题。
更新2
经过在 Calico 下详细的错误分析,我发现是对应的 pod 无法正常工作。详细分析错误后,我发现是防火墙中对应的 179 端口没有被我打开。修复这个错误后,我可以确定 pod 的功能正常,并且确认现在名称的解析也是正常的。
答案1
无法通过评论发布那么多内容。以答案的形式发布。
我检查了导游您一直在参考并设置我自己的测试集群(GCP,3xDebian10 VM)。
不同之处在于,~/kube-cluster/master.yml
我设置了不同的链接kube-flannel.yml
(并且该文件的内容与指南中的文件不同:))
$ grep http master.yml
shell: kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml >> pod_network_setup.txt
在我的集群上:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
instance-1 Ready master 2m48s v1.19.0
instance-2 Ready <none> 38s v1.19.0
instance-3 Ready <none> 38s v1.19.0
kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-f9fd979d6-8sxg7 1/1 Running 0 4m48s 10.244.0.2 instance-1 <none> <none>
coredns-f9fd979d6-z5gdl 1/1 Running 0 4m48s 10.244.0.3 instance-1 <none> <none>
kube-flannel-ds-4khll 1/1 Running 0 2m58s 10.156.0.21 instance-3 <none> <none>
kube-flannel-ds-h8d9l 1/1 Running 0 2m58s 10.156.0.20 instance-2 <none> <none>
kube-flannel-ds-zhzbf 1/1 Running 0 4m49s 10.156.0.19 instance-1 <none> <none>
$ kubectl -n kube-system get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6m15s k8s-app=kube-dns
sammy@instance-1:~$ ip route
default via 10.156.0.1 dev ens4
10.156.0.1 dev ens4 scope link
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
我没有发现 DNS 延迟问题。
kubectl create deployment busybox --image=nkolchenko/enea:server_go_latest
deployment.apps/busybox created
sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup default.default
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find default.default: NXDOMAIN
** server can't find default.default: NXDOMAIN
command terminated with exit code 1
real 0m0.227s
user 0m0.106s
sys 0m0.012s
sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10:53
Non-authoritative answer:
Name: google.com
Address: 172.217.22.78
Non-authoritative answer:
Name: google.com
Address: 2a00:1450:4001:820::200e
real 0m0.223s
user 0m0.102s
sys 0m0.012s
如果您需要我运行任何其他测试,请告诉我,我会在整个周末保留这个集群,然后将其拆除。
更新:
$ cat ololo
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
$ kubectl create -f ololo
pod/dnsutils created
$ kubectl get -A all -o wide | grep dns
default pod/dnsutils 1/1 Running 0 63s 10.244.2.8 instance-2 <none> <none>
kube-system pod/coredns-cc8845745-jtvlh 1/1 Running 0 10m 10.244.1.3 instance-3 <none> <none>
kube-system pod/coredns-cc8845745-xxh28 1/1 Running 0 10m 10.244.0.4 instance-1 <none> <none>
kube-system pod/coredns-cc8845745-zlv84 1/1 Running 0 10m 10.244.2.6 instance-2 <none> <none>
instance-1:~$ kubectl exec -i -t dnsutils -- time nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: google.com
Address: 172.217.21.206
Name: google.com
Address: 2a00:1450:4001:818::200e
real 0m 0.01s
user 0m 0.00s
sys 0m 0.00s
答案2
安装 Calico 并设置适当的防火墙规则(在所有节点上打开端口 179)后,我可以看到 coreDNS Pod 运行顺畅。因此,不同的镜像可以解析 DNS 地址,并且可以正确进行转发。