我们已将 Kubernetes 版本从 v1.24 升级到 v1.25。我们使用 kubespray(版本 v1.2.21)创建集群。集群已成功升级到 v1.25。但是,一旦我们部署 Pod,就无法从 Kubernetes Pod 连接到外部网络(如 google.com)。它会引发以下错误。
user@vm-util-mtm-wes-k8-upgrade-rnd:~$ kubectl exec -i -t dnsutils – nslookup google.com
Server: 169.254.25.10
Address: 169.254.25.10#53
** server can’t find google.com.reddog.microsoft.com: SERVFAIL
command terminated with exit code 1
我们已经尝试过此链接中提到的步骤:调试 DNS 解析,但问题仍然存在。有什么建议吗?
集群信息:
- Kubernetes version: v1.25
- Cloud being used: (put bare-metal if not on a public cloud) : Azure VMs
- Installation method: using kubespray
- Host OS: ubuntu 20.04 LTS
- CNI and version: Weave , v2.8.1
- CRI and version: docker, v20.10
以下是我们迄今为止尝试过的一些步骤
- 在 coredns configmap corefile 中,默认情况下它指向 8.8.8.8 8.8.4.4
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . 8.8.8.8 8.8.4.4 {
prefer_udp
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
进行了适当的更改以将其指向 /etc/resolv.conf 文件
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
prefer_udp
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
- 主节点上的 /etc/resolv.conf 文件的条目
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 10.23.64.41
nameserver 10.23.64.42
nameserver 10.23.0.41
search reddog.microsoft.com
其中可以看到已经定义了 3 个 nameserver,但是当我们执行 resolvctl 命令时
resolvectl | grep "Current DNS Server"
它显示的输出如下
Current DNS Server: 10.23.64.41
- 尝试在 /etc/resolv.conf 文件中仅保留一个名称服务器条目(即 10.23.64.41)并重新启动 kubelet 和 daemon-reload。
systemctl daemon-reload
systemctl restart kubelet
但问题仍然存在。