DNS 循环无法平衡 SSH 负载

DNS 循环无法平衡 SSH 负载

我使用 SSH 测试了 DNS 轮询,并注意到 SSH 客户端在我的测试环境中取得了惊人的效果。我使用 3 个节点和 RHEL 6.2(openssh-5.3p1、bind-9.7.3-8.P3)。主机密钥等内容已得到管理。

我的“问题”:

我希望在多个 SSH 服务器之间实现一种基本的负载平衡,使用多个 DNS 条目。我(几乎)确信这是可能的。但我得到的是一种基本的 HA... 似乎 openssh 客户端并不关心循环,它总是连接到同一个节点,除非该节点关闭,在最后一种情况下,客户端使用 DNS 条目列表中的另一条记录,然后成功连接到它。这是正常/常见的行为吗?或者我的测试有什么问题?

我把我的 strace 和 tcpdump 放在几个情况下会发生什么。如果您有任何想法或解释可以提供帮助,我将不胜感激 :)

登录 => 10.255.254.1 (节点 0),10.255.254.3 (节点 2) ssh 客户端 => 10.255.254.2 (节点 1)

node0 上的 DNS 服务器,RR 尚未禁用。

login IN A 10.255.254.1
login IN A 10.255.254.3

我确认:

  • 通过主机(1)的查找确认了循环;
  • ping(1) 命令看起来不错:

[root@node1 ~]# ping 登录

PING login.node (10.255.254.3) 56(84) bytes of data.
64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=1.73 ms
^C
[root@node1 ~]# ping login
PING login.node (10.255.254.1) 56(84) bytes of data.
64 bytes from node0.node (10.255.254.1): icmp_seq=1 ttl=64 time=0.467 ms
^C
[root@node1 ~]# ping login
PING login.node (10.255.254.3) 56(84) bytes of data.
64 bytes from node2.node (10.255.254.3): icmp_seq=1 ttl=64 time=0.433 ms
^C

测试 1(两个 SSH 服务器均已启动并可访问)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
(...)

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:03:04.875099 IP node1.node.53511 > node0.node.domain: 55904+ A? login.node. (29)
17:03:04.875417 IP node0.node.domain > node1.node.53511: 55904* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102)
17:03:04.875432 IP node1.node.53511 > node0.node.domain: 22271+ AAAA? login.node. (29)
17:03:04.875523 IP node0.node.domain > node1.node.53511: 22271* 0/1/0 (79)

=> 连接上节点2(10.255.254.3)

测试 2(两个 SSH 服务器仍然正常运行且可访问)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
(...)

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
17:04:29.663664 IP node1.node.51950 > node0.node.domain: 4685+ A? login.node. (29)
17:04:29.663685 IP node1.node.51950 > node0.node.domain: 36559+ AAAA? login.node. (29)
17:04:29.664046 IP node0.node.domain > node1.node.51950: 4685* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
17:04:29.664110 IP node0.node.domain > node1.node.51950: 36559* 0/1/0 (79)

=> 连接上节点2

(另一次测试再次确认与 node2 的连接。看来循环仅用于 ssh 客户端的初步测试)

测试 3(节点2上的SSH服务器已停止)

[root@node2 ~]# /etc/init.d/sshd stop
Stopping sshd:                                             [  OK  ]

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
17:09:05.854022 IP node1.node.41233 > node0.node.domain: 63435+ A? login.node. (29)
17:09:05.854055 IP node1.node.41233 > node0.node.domain: 3015+ AAAA? login.node. (29)
17:09:05.854436 IP node0.node.domain > node1.node.41233: 63435* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
17:09:05.854531 IP node0.node.domain > node1.node.41233: 3015* 0/1/0 (79)
17:09:05.856764 IP node1.node.59579 > node0.node.ssh: Flags [S], seq 3025023931, win 14600, options [mss 1460,sackOK,TS val 9854496 ecr 0,nop,wscale 7], length 0
17:09:05.856806 IP node0.node.ssh > node1.node.59579: Flags [S.], seq 1105519762, ack 3025023932, win 14480, options [mss 1460,sackOK,TS val 350907197 ecr 9854496,nop,wscale 7], length 0
17:09:05.857106 IP node1.node.59579 > node0.node.ssh: Flags [.], ack 1, win 115, options [nop,nop,TS val 9854496 ecr 350907197], length 0
17:09:05.865291 IP node0.node.ssh > node1.node.59579: Flags [P.], seq 1:22, ack 1, win 114, options [nop,nop,TS val 350907205 ecr 9854496], length 21
(...)

=> 连接上节点0(故障转移?惊喜!)

测试 4(相同条件)

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = -1 ECONNREFUSED (Connection refused)
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0


[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
(...)
17:11:44.154595 IP node1.node.56947 > node0.node.domain: 4602+ A? login.node. (29)
17:11:44.154862 IP node0.node.domain > node1.node.56947: 4602* 2/1/1 A 10.255.254.3, A 10.255.254.1 (102)
(...)

=>相同的结果(node0 上的连接)

测试 5(节点2上的SSH服务器重新启动)

[root@node2 ~]# /etc/init.d/sshd restart
Stopping sshd:                                             [FAILED]
Starting sshd:                                             [  OK  ]

[root@node1 ~]# strace -e connect ssh login
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
(...)
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.1")}, 16) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("10.255.254.3")}, 16) = 0

[root@node0 ~]# tcpdump -i eth0 src node1 or dst node1
(...)
17:17:12.893633 IP node1.node.42432 > node0.node.domain: 7264+ A? login.node. (29)
17:17:12.893988 IP node0.node.domain > node1.node.42432: 7264* 2/1/1 A 10.255.254.1, A 10.255.254.3 (102)
(...)

=> 连接上节点2再次(故障回复)

答案1

DNS 不提供负载平衡,所以除非主机宕机,否则它将始终使用返回的 DNS 记录列表中的记录。如果您想动态处理宕机的主机,则必须在 SSH 盒之间对传入连接进行负载平衡。

就负载平衡而言,循环 DNS 请求非常基础。请查看缺点部分:http://en.wikipedia.org/wiki/Round_robin_DNS

答案2

好吧,最终,这种行为正如上面所述,仅在同一个子网内有效。当我在另一个 LAN(带有中间网关)上使用 openssh 客户端时,它有效!我的意思是:我得到了一个基本的负载分配,当其中一个节点关闭时,会进行“故障转移”。

因此我得出结论,RRDNS足以处理SSH用户的基本负载分配。

相关内容