尽管 tcpdump 显示名称服务器已响应，但 DNS 查找仍然失败

2024-6-1 • tag-icon

我的一些 EC2 实例上的 DNS 查找会间歇性地失败。重新启动可以解决问题，但几个小时后（或几天后）它会回到相同的故障状态，并一直保持该状态，直到重新启动

当失败发生时，我尝试www.google.com使用来解决8.8.8.8。输出如下：

# dig @8.8.8.8 www.google.com

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> @8.8.8.8 www.google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

我tcpdump在运行 dig 的同时并行运行了。从输出中，我可以看到名称服务器正在发送响应。因此，我假设操作系统正在丢弃响应

# tcpdump -i eth0 udp and port 53 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:02:18.312658 IP (tos 0x0, ttl 254, id 36162, offset 0, flags [DF], proto UDP (17), length 76)

(I've removed additional lines from output)

    my_hostname.54159 > 8.8.8.8.domain: [udp sum ok] 12088+ [1au] A? www.google.com. ar: . OPT UDPsize=4096 (43)
07:03:29.274714 IP (tos 0x0, ttl 255, id 8454, offset 0, flags [DF], proto UDP (17), length 66)
    my_hostname.35356 > 10.210.148.199.domain: [udp sum ok] 28668+ PTR? 8.8.8.8.in-addr.arpa. (38)
07:03:29.277401 IP (tos 0x0, ttl 128, id 7424, offset 0, flags [DF], proto UDP (17), length 90)
    10.210.148.199.domain > my_hostname.35356: [udp sum ok] 28668 q: PTR? 8.8.8.8.in-addr.arpa. 1/0/0 8.8.8.8.in-addr.arpa. [5m] PTR dns.google. (62)
07:03:29.279305 IP (tos 0x0, ttl 115, id 5157, offset 0, flags [none], proto UDP (17), length 167)
    8.8.8.8.domain > my_hostname.54159: [udp sum ok] 12088 q: A? www.google.com. 6/0/1 www.google.com. [5m] A 172.253.122.104, www.google.com. [5m] A 172.253.122.106, www.google.com. [5m] A 172.253.122.99, www.google.com. [5m] A 172.253.122.103, www.google.com. [5m] A 172.253.122.105, www.google.com. [5m] A 172.253.122.147 ar: . OPT UDPsize=512 (139)

(I've removed additional lines from output)


^C
3547 packets captured
4276 packets received by filter
729 packets dropped by kernel

通过 TCP使用dig可按预期工作

# dig +tries=1 @8.8.8.8 www.google.com +short +vc
172.253.122.103
172.253.122.106
172.253.122.105
172.253.122.147
172.253.122.104
172.253.122.99

我检查了一下iptables。该服务处于非活动状态，规则中没有任何内容表明它有问题

# iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT

我还研究了有关网络的其他一些事项：

# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 10.210.151.34  netmask 255.255.255.128  broadcast 10.210.151.127
        ether 0e:6a:ac:cb:2a:f9  txqueuelen 1000  (Ethernet)
        RX packets 4529869  bytes 552070750 (526.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4475269  bytes 756543406 (721.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# netstat -suna
IcmpMsg:
    InType0: 25
    InType3: 1615
    OutType3: 1910
    OutType8: 33
Udp:
    367075 packets received
    1918 packets to unknown port received.
    466742 packet receive errors
    838332 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
IpExt:
    InOctets: 928068155
    OutOctets: 1678778245
    InNoECTPkts: 7967209
    InECT0Pkts: 10918

# sysctl net.core.rmem_max
net.core.rmem_max = 16777216

# sysctl net.ipv4.udp_mem
net.ipv4.udp_mem = 382056   509411  764112

# uptime
 08:02:13 up 3 days, 18 min,  1 user,  load average: 0.00, 0.00, 0.00

packet receive errors有确凿证据吗？

如何解决 DNS 查找失败（无需重新启动）？

相关内容