缓存、转发 Bind 9.9.4 服务器运行数周,突然所有查询都出现 SERVFAIL(重新启动即可修复)

缓存、转发 Bind 9.9.4 服务器运行数周,突然所有查询都出现 SERVFAIL(重新启动即可修复)

我在两台服务器(CentOS 6 和 7)上运行了 bind 9.9.5,用于缓存和转发邮件服务器的 DNS 查询。服务器运行了数周,然后突然以 SERVFAIL 响应所有查询。第一次发生这种情况时,两台服务器都在同一天开始出现故障。现在,一周后,这种情况再次发生,但只发生在一台服务器上。重新启动named可以解决问题。

/etc/named.conf以下是(完整文件的重要部分,不相关的部分)这里):

acl "trusted" {
    localhost;
    localnets;
    10.128.0.0/9;
};
options {
    listen-on port 53 { 127.0.0.1; 10.128.0.0/9; };
    listen-on-v6 port 53 { ::1; };
    directory               "/var/named";
    dump-file               "/var/named/data/cache_dump.db";
    statistics-file         "/var/named/data/named_stats.txt";
    memstatistics-file      "/var/named/data/named_mem_stats.txt";
    bindkeys-file           "/etc/named.iscdlv.key";
    managed-keys-directory  "/var/named/dynamic";
    auth-nxdomain no;
    version "asdf";

    dnssec-enable       yes;
    dnssec-validation   yes;
    dnssec-lookaside    auto;

    recursion yes;
    forward only;
    forwarders { 169.254.169.254; };

    allow-query     { trusted; };
    allow-recursion { trusted; };
};

当服务器处于故障状态时,dig 查询响应:

[q@oak3] dig @10.128.0.9 apple.com a

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6_10.1 <<>> @10.128.0.9 apple.com a
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 44811
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;apple.com.         IN  A

;; Query time: 3 msec
;; SERVER: 10.128.0.9#53(10.128.0.9)
;; WHEN: Fri Mar 15 19:22:06 2019
;; MSG SIZE  rcvd: 27

出现以下日志条目:

==> /var/named/chroot/var/log/queries.log <==
15-Mar-2019 19:22:06.983 client 10.128.0.4#55092 (apple.com): query: apple.com IN A + (10.128.0.9)

==> /var/named/chroot/var/log/dnssec.log <==
15-Mar-2019 19:22:06.984 validating apple.com/A: bad cache hit (com/DS)

==> /var/named/chroot/var/log/lame-servers.log <==
15-Mar-2019 19:22:06.984 broken trust chain resolving 'apple.com/A/IN': 169.254.169.254#53

重新启动后named,运行相同的查询(dig @10.128.0.9 apple.com a)响应正确,并且日志中没有错误。

在 下查询开始失败时,没有任何相关记录/var/logs。服务器最近没有重新启动,最近也没有安装任何更新。

我的配置有问题吗?什么原因导致正常运行的绑定服务器突然开始出现故障?

相关内容