k8 版本升级到 v1.25 后，k8 pod 中的 DNS 解析不起作用

2024-6-2 • tag-icon

我们已将 Kubernetes 版本从 v1.24 升级到 v1.25。我们使用 kubespray（版本 v1.2.21）创建集群。集群已成功升级到 v1.25。但是，一旦我们部署 Pod，就无法从 Kubernetes Pod 连接到外部网络（如 google.com）。它会引发以下错误。

user@vm-util-mtm-wes-k8-upgrade-rnd:~$ kubectl exec -i -t dnsutils – nslookup google.com
Server: 169.254.25.10
Address: 169.254.25.10#53

** server can’t find google.com.reddog.microsoft.com: SERVFAIL

command terminated with exit code 1

我们已经尝试过此链接中提到的步骤：调试 DNS 解析，但问题仍然存在。有什么建议吗？

集群信息：


 - Kubernetes version: v1.25
 - Cloud being used: (put bare-metal if not on a public cloud) : Azure VMs
 - Installation method: using kubespray
 - Host OS: ubuntu 20.04 LTS
 - CNI and version: Weave , v2.8.1
 - CRI and version: docker, v20.10

以下是我们迄今为止尝试过的一些步骤

在 coredns configmap corefile 中，默认情况下它指向 8.8.8.8 8.8.4.4

Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . 8.8.8.8 8.8.4.4 {
          prefer_udp
          max_concurrent 1000
        }
        cache 30

        loop
        reload
        loadbalance
    }

进行了适当的更改以将其指向 /etc/resolv.conf 文件

Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          prefer_udp
          max_concurrent 1000
        }
        cache 30

        loop
        reload
        loadbalance
    }

主节点上的 /etc/resolv.conf 文件的条目

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.23.64.41
nameserver 10.23.64.42
nameserver 10.23.0.41
search reddog.microsoft.com

其中可以看到已经定义了 3 个 nameserver，但是当我们执行 resolvctl 命令时

resolvectl | grep "Current DNS Server"

它显示的输出如下

 Current DNS Server: 10.23.64.41

尝试在 /etc/resolv.conf 文件中仅保留一个名称服务器条目（即 10.23.64.41）并重新启动 kubelet 和 daemon-reload。

systemctl daemon-reload
systemctl restart kubelet

但问题仍然存在。

相关内容