在长期运行的 ruby​​ 进程中,DNS 查找最终会变得非常缓慢

在长期运行的 ruby​​ 进程中,DNS 查找最终会变得非常缓慢

我在 Ubuntu 服务器上运行一个 Ruby 应用程序,该应用程序经常需要向其他服务发出 HTTP 请求。其中一些服务位于内部域中,一些可通过公共 DNS 访问。我有一个 Consul 代理与我的 Ruby 应用程序共置,其工作是解析内部组件的 DNS 查询。

最终,Ruby 应用程序变慢,导致 DNS 查询外部域需要 5 秒以上才能解决。我需要帮助来理解为什么会发生下面描述的行为。

澄清一下,我的目标是理解对于以下行为,我已经考虑了几种修复/解决方法。

我已将问题归结为以下几点:

最小可重现示例:

/etc/resolv.conf

nameserver 127.0.0.1
nameserver 8.8.8.8

我用以下监听端口 53 的 always-SERVFAIL DNS 解析器替换了 Consul:

#!/usr/bin/env ruby

require "rubydns"

RubyDNS.run_server(listen: [[:udp, "0.0.0.0", 53], [:tcp, "0.0.0.0", 53]]) do
  otherwise do |transaction|
    transaction.fail!(:ServFail)
  end
end

我将以上内容放在一个文件中(/var/vcap/data/tmp/dns.rb),然后运行它:

ruby /var/vcap/data/tmp/dns.rb

这不是很重要,但上面的代码是在 ruby​​ 系统上运行的,即 1.9。然后我运行以下命令(同样,可能并不重要,但这使用了不同的 ruby​​ 版本):

var/vcap/packages/ruby-2.1.6/bin/ruby -r'net/protocol' -e \
'1.step do |i|; t=Time.now; TCPSocket.open("SOME.REAL.DOMAIN",80).close; p Time.now-t; end'

以上是轻微确实,打印出来的内容格式稍微复杂一些,但为了让它适合这篇文章而不需要左右滚动,我稍微简化了它。

观察到的行为:

我看到的行为是,一段时间内,循环的每次迭代大约需要 10 毫秒,但最终它开始花费 5000 毫秒的时间。

行为迟缓的持续时间

有一次,我修改/etc/resolv.confoptions timeout:4重新运行了这个实验,每次迭代大约需要 4000 毫秒。因此,5000 毫秒似乎来自默认的 resolvconf 超时 5 秒。

减速前的迭代次数

可能需要几百到几万次迭代才能让行为浮现出来,但总是会这样。如果您停止循环并再次运行命令,结果还是一样,数百或数千次 ~10ms 迭代,然后持续 ~5000ms 迭代。通常,中间有一两次迭代需要中等时间,即 2000-4000ms。

长寿命的 ruby​​ 循环与重复的一次性 ruby​​ 命令

我相信这与长寿命的 ruby​​ 进程有关,因为如果我运行一个 bash 循环,并且在 bash 循环的每次迭代中执行一些 ruby​​,我不会看到这种最终的减速。例如以下才不是表现出不良行为:

while true; do time /var/vcap/packages/ruby-VERSION/bin/ruby -r'net/protocol' -e \
'TCPSocket.open("SOME.REAL.DOMAIN, 80).close'; done

仅本地主机名称服务器与两个名称服务器

此外,这与拥有多个名称服务器有关。如果我编辑/etc/resolv.conf为仅包含127.0.0.1名称服务器,并将添加rescue nil到 ruby​​ 迭代(以防止脚本在第一次迭代时崩溃),它似乎可以运行很长时间而不会变慢。

调试信息:

我做过strace上述循环,其中有 7873 次快速迭代(~10 毫秒),然后一次迭代耗时 3013 毫秒,然后所有后续迭代耗时略超过 5000 毫秒。所有快速迭代的 strace 输出看起来基本相同,所有 ~5000 毫秒迭代的 strace 输出看起来基本相同。因此,我将在此处包含来自最后的快速迭代、3013ms 迭代和第一的~5000ms 迭代。当然,在 strace 输出中,它们都是连续的,但是为了便于查看,我将其分成了单独的块。

最后一次快速迭代:9ms

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096)                       = 0
close(7)                                = 0
munmap(0x7f4e70992000, 4096)            = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0)    = 1 ([{fd=7, revents=POLLOUT}])
sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=7, events=POLLIN}], 1, 5000)  = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45])                = 0
recvfrom(7, "A\5\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4999)  = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45])                = 0
recvfrom(7, "YH\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0)    = 1 ([{fd=8, revents=POLLOUT}])
sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"A\5\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"YH\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=8, events=POLLIN}], 1, 3000)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61])                = 0
recvfrom(8, "A\5\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLIN}], 1, 2996)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109])               = 0
recvfrom(8, "YH\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7)                                = 0
close(8)                                = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7)                                = 0
write(1, "7873: 9ms", 9)                = 9
write(1, "\n", 1)                       = 1

中间迭代:3013ms

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096)                       = 0
close(7)                                = 0
munmap(0x7f4e70992000, 4096)            = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0)    = 1 ([{fd=7, revents=POLLOUT}])
sendmmsg(7, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=7, events=POLLIN}], 1, 5000)  = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45])                = 0
recvfrom(7, "u>\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4999)  = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45])                = 0
recvfrom(7, "e?\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0)    = 1 ([{fd=8, revents=POLLOUT}])
sendmmsg(8, {{{msg_name(0)=NULL, msg_iov(1)=[{"u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_OOB|MSG_DONTROUTE}, 45}, {{msg_name(0)=NULL, msg_iov(1)=[{"e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45}], msg_controllen=0, msg_flags=MSG_CONFIRM}, 45}}, 2, MSG_NOSIGNAL) = 2
poll([{fd=8, events=POLLIN}], 1, 3000)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109])               = 0
recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
poll([{fd=8, events=POLLIN}], 1, 2997)  = 0 (Timeout)
poll([{fd=8, events=POLLOUT}], 1, 0)    = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "u>\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 3000)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61])                = 0
recvfrom(8, "u>\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLOUT}], 1, 2997) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "e?\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 2997)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109])               = 0
recvfrom(8, "e?\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7)                                = 0
close(8)                                = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7)                                = 0
write(1, "7874: 3013ms", 12)            = 12
write(1, "\n", 1)                       = 1

第一次慢速迭代:5014ms

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=305, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4e70992000
read(7, "127.0.0.1 localhost af0a3ebe-91a"..., 4096) = 305
read(7, "", 4096)                       = 0
close(7)                                = 0
munmap(0x7f4e70992000, 4096)            = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=85, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 7
connect(7, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
poll([{fd=7, events=POLLOUT}], 1, 0)    = 1 ([{fd=7, revents=POLLOUT}])
sendto(7, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=7, events=POLLIN}], 1, 5000)  = 1 ([{fd=7, revents=POLLIN}])
ioctl(7, FIONREAD, [45])                = 0
recvfrom(7, "\r\337\201\2\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16]) = 45
poll([{fd=7, events=POLLIN}], 1, 4995)  = 0 (Timeout)
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 8
connect(8, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=8, events=POLLOUT}], 1, 0)    = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "\r\337\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 3000)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [61])                = 0
recvfrom(8, "\r\337\201\200\0\1\0\1\0\0\0\0\3uaa\6useast\6appfog\2q"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 61
poll([{fd=8, events=POLLOUT}], 1, 2996) = 1 ([{fd=8, revents=POLLOUT}])
sendto(8, "}\217\1\0\0\1\0\0\0\0\0\0\3uaa\6useast\6appfog\2q"..., 45, MSG_NOSIGNAL, NULL, 0) = 45
poll([{fd=8, events=POLLIN}], 1, 2996)  = 1 ([{fd=8, revents=POLLIN}])
ioctl(8, FIONREAD, [109])               = 0
recvfrom(8, "}\217\201\200\0\1\0\0\0\1\0\0\3uaa\6useast\6appfog\2q"..., 1987, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, [16]) = 109
close(7)                                = 0
close(8)                                = 0
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 7
fcntl(7, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("63.251.167.126")}, 16) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
fstat(7, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
close(7)                                = 0
write(1, "7875: 5014ms", 12)            = 12
write(1, "\n", 1)                       = 1

其他观察结果:

上面有几个奇怪的事情。请注意,~3000ms 迭代超时轮询文件描述符 8,它用于与8.8.8.8名称服务器通信,而 ~5000ms 迭代(以及所有后续迭代)超时轮询文件描述符 7,它用于名称服务器127.0.0.1。还请注意,所有快速迭代和中间迭代都用于sendmmsg通过套接字进行通信,但所有 ~5000ms 迭代都使用sendto

答案1

也许我理解了这个问题(我希望如此)。我认为您可以通过使用自定义绑定 DNS 配置来解决这个问题,同时使用私有“假”区域和互联网区域,并修复适当的 TTL,以便使用缓存 DNS 引擎提高响应时间。希望它能有所帮助!问候。

答案2

只需在 localhost 上配置绑定即可。您必须为 .consul 区域添加一个转发区域(默认为端口 8600),请参阅:https://www.consul.io/docs/agent/dns.html

确保您的服务器可以访问互联网(udp 和 tcp 端口 53 以及 icmp)。

然后您可以安全地删除 /etc/resolv.conf 中的 8.8.8.8 部分。

相关内容