我的一些 EC2 实例上的 DNS 查找会间歇性地失败。重新启动可以解决问题,但几个小时后(或几天后)它会回到相同的故障状态,并一直保持该状态,直到重新启动
当失败发生时,我尝试www.google.com
使用来解决8.8.8.8
。输出如下:
# dig @8.8.8.8 www.google.com
; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> @8.8.8.8 www.google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
我tcpdump
在运行 dig 的同时并行运行了。从输出中,我可以看到名称服务器正在发送响应。因此,我假设操作系统正在丢弃响应
# tcpdump -i eth0 udp and port 53 -vvv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:02:18.312658 IP (tos 0x0, ttl 254, id 36162, offset 0, flags [DF], proto UDP (17), length 76)
(I've removed additional lines from output)
my_hostname.54159 > 8.8.8.8.domain: [udp sum ok] 12088+ [1au] A? www.google.com. ar: . OPT UDPsize=4096 (43)
07:03:29.274714 IP (tos 0x0, ttl 255, id 8454, offset 0, flags [DF], proto UDP (17), length 66)
my_hostname.35356 > 10.210.148.199.domain: [udp sum ok] 28668+ PTR? 8.8.8.8.in-addr.arpa. (38)
07:03:29.277401 IP (tos 0x0, ttl 128, id 7424, offset 0, flags [DF], proto UDP (17), length 90)
10.210.148.199.domain > my_hostname.35356: [udp sum ok] 28668 q: PTR? 8.8.8.8.in-addr.arpa. 1/0/0 8.8.8.8.in-addr.arpa. [5m] PTR dns.google. (62)
07:03:29.279305 IP (tos 0x0, ttl 115, id 5157, offset 0, flags [none], proto UDP (17), length 167)
8.8.8.8.domain > my_hostname.54159: [udp sum ok] 12088 q: A? www.google.com. 6/0/1 www.google.com. [5m] A 172.253.122.104, www.google.com. [5m] A 172.253.122.106, www.google.com. [5m] A 172.253.122.99, www.google.com. [5m] A 172.253.122.103, www.google.com. [5m] A 172.253.122.105, www.google.com. [5m] A 172.253.122.147 ar: . OPT UDPsize=512 (139)
(I've removed additional lines from output)
^C
3547 packets captured
4276 packets received by filter
729 packets dropped by kernel
通过 TCP使用dig
可按预期工作
# dig +tries=1 @8.8.8.8 www.google.com +short +vc
172.253.122.103
172.253.122.106
172.253.122.105
172.253.122.147
172.253.122.104
172.253.122.99
我检查了一下iptables
。该服务处于非活动状态,规则中没有任何内容表明它有问题
# iptables -S
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
我还研究了有关网络的其他一些事项:
# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 10.210.151.34 netmask 255.255.255.128 broadcast 10.210.151.127
ether 0e:6a:ac:cb:2a:f9 txqueuelen 1000 (Ethernet)
RX packets 4529869 bytes 552070750 (526.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 4475269 bytes 756543406 (721.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# netstat -suna
IcmpMsg:
InType0: 25
InType3: 1615
OutType3: 1910
OutType8: 33
Udp:
367075 packets received
1918 packets to unknown port received.
466742 packet receive errors
838332 packets sent
0 receive buffer errors
0 send buffer errors
UdpLite:
IpExt:
InOctets: 928068155
OutOctets: 1678778245
InNoECTPkts: 7967209
InECT0Pkts: 10918
# sysctl net.core.rmem_max
net.core.rmem_max = 16777216
# sysctl net.ipv4.udp_mem
net.ipv4.udp_mem = 382056 509411 764112
# uptime
08:02:13 up 3 days, 18 min, 1 user, load average: 0.00, 0.00, 0.00
packet receive errors
有确凿证据吗?
如何解决 DNS 查找失败(无需重新启动)?