我在 Ubuntu 服务器上运行一个 Ruby 应用程序,该应用程序经常需要向其他服务发出 HTTP 请求。其中一些服务位于内部域中,一些可通过公共 DNS 访问。我有一个 Consul 代理与我的 Ruby 应用程序共置,其工作是解析内部组件的 DNS 查询。
最终,Ruby 应用程序变慢,导致 DNS 查询外部域需要 5 秒以上才能解决。我需要帮助来理解为什么会发生下面描述的行为。
澄清一下,我的目标是理解对于以下行为,我已经考虑了几种修复/解决方法。
我已将问题归结为以下几点:
最小可重现示例:
/etc/resolv.conf
nameserver 127.0.0.1
nameserver 8.8.8.8
我用以下监听端口 53 的 always-SERVFAIL DNS 解析器替换了 Consul:
#!/usr/bin/env ruby
require "rubydns"
RubyDNS.run_server(listen: [[:udp, "0.0.0.0", 53], [:tcp, "0.0.0.0", 53]]) do
otherwise do |transaction|
transaction.fail!(:ServFail)
end
end
我将以上内容放在一个文件中(/var/vcap/data/tmp/dns.rb
),然后运行它:
ruby /var/vcap/data/tmp/dns.rb
这不是很重要,但上面的代码是在 ruby 系统上运行的,即 1.9。然后我运行以下命令(同样,可能并不重要,但这使用了不同的 ruby 版本):
var/vcap/packages/ruby-2.1.6/bin/ruby -r'net/protocol' -e \
'1.step do |i|; t=Time.now; TCPSocket.open("SOME.REAL.DOMAIN",80).close; p Time.now-t; end'
以上是轻微确实,打印出来的内容格式稍微复杂一些,但为了让它适合这篇文章而不需要左右滚动,我稍微简化了它。
观察到的行为:
我看到的行为是,一段时间内,循环的每次迭代大约需要 10 毫秒,但最终它开始花费 5000 毫秒的时间。
行为迟缓的持续时间
有一次,我修改/etc/resolv.conf
并options timeout:4
重新运行了这个实验,每次迭代大约需要 4000 毫秒。因此,5000 毫秒似乎来自默认的 resolvconf 超时 5 秒。
减速前的迭代次数
可能需要几百到几万次迭代才能让行为浮现出来,但总是会这样。如果您停止循环并再次运行命令,结果还是一样,数百或数千次 ~10ms 迭代,然后持续 ~5000ms 迭代。通常,中间有一两次迭代需要中等时间,即 2000-4000ms。
长寿命的 ruby 循环与重复的一次性 ruby 命令
我相信这与长寿命的 ruby 进程有关,因为如果我运行一个 bash 循环,并且在 bash 循环的每次迭代中执行一些 ruby,我不会看到这种最终的减速。例如以下才不是表现出不良行为:
while true; do time /var/vcap/packages/ruby-VERSION/bin/ruby -r'net/protocol' -e \
'TCPSocket.open("SOME.REAL.DOMAIN, 80).close'; done
仅本地主机名称服务器与两个名称服务器
此外,这与拥有多个名称服务器有关。如果我编辑/etc/resolv.conf
为仅包含127.0.0.1
名称服务器,并将添加rescue nil
到 ruby 迭代(以防止脚本在第一次迭代时崩溃),它似乎可以运行很长时间而不会变慢。
调试信息:
我做过strace
上述循环,其中有 7873 次快速迭代(~10 毫秒),然后一次迭代耗时 3013 毫秒,然后所有后续迭代耗时略超过 5000 毫秒。所有快速迭代的 strace 输出看起来基本相同,所有 ~5000 毫秒迭代的 strace 输出看起来基本相同。因此,我将在此处包含来自最后的快速迭代、3013ms 迭代和第一的~5000ms 迭代。当然,在 strace 输出中,它们都是连续的,但是为了便于查看,我将其分成了单独的块。
最后一次快速迭代:9ms
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096) = 0
close(7) = 0
munmap(0x7f4e70992000, 4096) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}])
sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45]) = 0
recvfrom(7, "A\5\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4999) = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45]) = 0
recvfrom(7, "YH\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}])
sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61]) = 0
recvfrom(8, "A\5\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLIN}], 1, 2996) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109]) = 0
recvfrom(8, "YH\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7) = 0
close(8) = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7) = 0
write(1, "7873: 9ms", 9) = 9
write(1, "\n", 1) = 1
中间迭代:3013ms
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096) = 0
close(7) = 0
munmap(0x7f4e70992000, 4096) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}])
sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45]) = 0
recvfrom(7, "u>\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4999) = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45]) = 0
recvfrom(7, "e?\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}])
sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109]) = 0
recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
poll([{fd=8, events=POLLIN}], 1, 2997) = 0 (Timeout)
poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61]) = 0
recvfrom(8, "u>\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLOUT}], 1, 2997) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 2997) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109]) = 0
recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7) = 0
close(8) = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7) = 0
write(1, "7874: 3013ms", 12) = 12
write(1, "\n", 1) = 1
第一次慢速迭代:5014ms
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096) = 0
close(7) = 0
munmap(0x7f4e70992000, 4096) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0) = 1 ([{fd=7, revents=POLLOUT}])
sendto(7, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=7, events=POLLIN}], 1, 5000) = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45]) = 0
recvfrom(7, "\r\337\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4995) = 0 (Timeout)
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 3000) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61]) = 0
recvfrom(8, "\r\337\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLOUT}], 1, 2996) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "}\217\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 2996) = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109]) = 0
recvfrom(8, "}\217\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7) = 0
close(8) = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD) = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7) = 0
write(1, "7875: 5014ms", 12) = 12
write(1, "\n", 1) = 1
其他观察结果:
上面有几个奇怪的事情。请注意,~3000ms 迭代超时轮询文件描述符 8,它用于与8.8.8.8
名称服务器通信,而 ~5000ms 迭代(以及所有后续迭代)超时轮询文件描述符 7,它用于名称服务器127.0.0.1
。还请注意,所有快速迭代和中间迭代都用于sendmmsg
通过套接字进行通信,但所有 ~5000ms 迭代都使用sendto
。
答案1
也许我理解了这个问题(我希望如此)。我认为您可以通过使用自定义绑定 DNS 配置来解决这个问题,同时使用私有“假”区域和互联网区域,并修复适当的 TTL,以便使用缓存 DNS 引擎提高响应时间。希望它能有所帮助!问候。
答案2
只需在 localhost 上配置绑定即可。您必须为 .consul 区域添加一个转发区域(默认为端口 8600),请参阅:https://www.consul.io/docs/agent/dns.html
确保您的服务器可以访问互联网(udp 和 tcp 端口 53 以及 icmp)。
然后您可以安全地删除 /etc/resolv.conf 中的 8.8.8.8 部分。