我正在运行 350 个容器,但它们存在 DNS 超时问题。
在 docker.service 中可以观察到以下日志
Oct 20 22:18:54 node1 dockerd[22149]: time="2023-10-20T22:18:54.208340628Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.248:59201->1.1.1.1:53: i/o timeout"
Oct 20 22:20:14 node1 dockerd[22149]: time="2023-10-20T22:20:14.302824917Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.35:52059->1.1.1.1:53: i/o timeout"
Oct 20 22:20:59 node1 dockerd[22149]: time="2023-10-20T22:20:59.359135519Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.2.97:39852->1.1.1.1:53: i/o timeout"
Oct 20 22:23:01 node1 dockerd[22149]: time="2023-10-20T22:23:01.080541412Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;collectora.storj.io.\tIN\t A" error="read udp 10.1.1.169:46591->1.1.1.1:53: i/o timeout"
Oct 20 22:23:48 node1 dockerd[22149]: time="2023-10-20T22:23:48.319370297Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t AAAA" error="read udp 10.1.1.211:48580->1.1.1.1:53: i/o timeout"
Oct 20 22:24:25 node1 dockerd[22149]: time="2023-10-20T22:24:25.994840345Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t A" error="read udp 10.1.1.32:39979->1.1.1.1:53: i/o timeout"
Oct 20 22:26:18 node1 dockerd[22149]: time="2023-10-20T22:26:18.656264587Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;us1.storj.io.\tIN\t A" error="read udp 10.1.1.146:57250->1.1.1.1:53: i/o timeout"
Oct 20 22:29:51 node1 dockerd[22149]: time="2023-10-20T22:29:51.320022392Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t AAAA" error="read udp 10.1.1.233:47655->1.1.1.1:53: i/o timeout"
Oct 20 22:29:51 node1 dockerd[22149]: time="2023-10-20T22:29:51.320369184Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t A" error="read udp 10.1.1.233:42511->1.1.1.1:53: i/o timeout"
Oct 20 22:32:31 node1 dockerd[22149]: time="2023-10-20T22:32:31.558775625Z" level=error msg="[resolver] failed to query DNS server: 1.1.1.1:53, query: ;version.storj.io.\tIN\t AAAA" error="read udp 10.1.1.15:34058->1.1.1.1:53: i/o timeout"
我测试在网络上运行一个容器,它能够毫无问题地解决问题。
root@node1:~# docker run -it --rm --network my_custom_network alpine:latest
/ # cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0
/ # ping google.com
PING google.com (142.250.64.78): 56 data bytes
64 bytes from 142.250.64.78: seq=0 ttl=118 time=10.165 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 10.165/10.165/10.165 ms
/ # ping version.storj.io
PING version.storj.io (34.173.164.90): 56 data bytes
64 bytes from 34.173.164.90: seq=0 ttl=59 time=32.469 ms
64 bytes from 34.173.164.90: seq=1 ttl=59 time=32.599 ms
docker compose 在网络方面有如下结构
networks:
my_custom_network:
driver: bridge
ipam:
config:
- subnet: 10.1.0.0/22
看起来 DNS 对容器来说运行良好,但出于某种原因,dockerd 在某些请求上超时。我该如何排除故障并找出这些超时的原因?是否可能存在虚拟网络拥塞?