我在两台服务器上的 Debian GNU/Linux 上都使用 ISC DHCP 版本 4.1.1。我尝试使用各种版本的 ISC DHCP 解决以下问题,但结果仍然相同。
我针对不同子网上的两个服务器之间的故障转移的配置是:
#-----------------------------------------------
# Primary Server
#-----------------------------------------------
authoritative;
default-lease-time 900;
max-lease-time 1800;
option domain-name "foo.com";
option domain-name-servers 10.12.0.254;
failover peer "foo" {
primary;
address 10.12.0.254;
port 647;
peer address 10.10.10.12;
peer port 647;
max-response-delay 30;
max-unacked-updates 10;
load balance max seconds 3;
mclt 1800;
split 128;
}
subnet 10.12.0.0 netmask 255.255.0.0 {
pool {
failover peer "foo";
range 10.12.10.0 10.12.112.0;
range 10.12.112.12 10.12.255.254;
deny dynamic bootp clients;
}
option routers 10.12.0.254;
option subnet-mask 255.255.0.0;
option broadcast-address 10.12.255.255;
}
#-----------------------------------------------
# Secondary Server
#-----------------------------------------------
authoritative;
default-lease-time 900;
max-lease-time 1800;
option domain-name "foo.com";
option domain-name-servers 10.12.0.254;
failover peer "foo" {
secondary;
address 10.10.10.12;
port 647;
peer address 10.12.0.254;
peer port 647;
max-response-delay 30;
max-unacked-updates 10;
load balance max seconds 3;
}
subnet 10.12.0.0 netmask 255.255.0.0 {
pool {
failover peer "foo";
range 10.12.10.0 10.12.112.0;
range 10.12.112.12 10.12.255.254;
deny dynamic bootp clients;
}
option routers 10.12.0.254;
option subnet-mask 255.255.0.0;
option broadcast-address 10.12.255.255;
}
subnet 10.10.10.0 netmask 255.255.255.240 {
}
在连接主服务器网络和辅服务器网络的路由器上启用了 IP 辅助程序(又名 UDP 辅助程序)和 DHCP 中继,我可以从一台服务器 ping 和 ssh 到另一台服务器并返回。
当我在两台服务器上启动 dhcpd 服务时,它们无法平衡租约。
我粘贴了两台服务器的日志样本
主服务器
Sep 19 10:31:11 primary dhcpd: failover peer foo: I move from recover to startup
Sep 19 10:31:11 primary dhcpd: failover peer foo: I move from startup to recover
Sep 19 10:31:11 primary dhcpd: Sent update request all message to foo
Sep 19 10:31:20 primary dhcpd: peer foo: disconnected
Sep 19 10:31:22 primary dhcpd: failover peer foo: peer moves from recover-done to recover-done
Sep 19 10:31:22 primary dhcpd: failover peer foo: peer moves from recover-done to recover-done
Sep 19 10:31:45 primary dhcpd: DHCPINFORM from 10.12.181.177 via eth1
Sep 19 10:31:45 primary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth1
Sep 19 10:32:45 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:32:46 primary dhcpd: DHCPINFORM from 10.12.181.177 via eth1
Sep 19 10:32:46 primary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth1
Sep 19 10:32:49 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:32:57 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:33:13 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:13 primary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c (PC1) via eth1: not responding (recovering)
Sep 19 10:33:17 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:25 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
Sep 19 10:33:41 primary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 (PC2) via eth1: not responding (recovering)
辅助服务器
Sep 19 10:31:11 secondary dhcpd: Update request all from foo: sending update
Sep 19 10:31:23 secondary dhcpd: Wrote 22 leases to leases file.
Sep 19 10:31:23 secondary dhcpd: failover peer foo: I move from recover-done to startup
Sep 19 10:31:23 secondary dhcpd: failover peer foo: I move from startup to recover-done
Sep 19 10:31:45 secondary dhcpd: DHCPINFORM from 10.12.181.177 via 10.12.0.1
Sep 19 10:31:45 secondary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth0
Sep 19 10:32:45 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:32:46 secondary dhcpd: DHCPINFORM from 10.12.181.177 via 10.12.0.1
Sep 19 10:32:46 secondary dhcpd: DHCPACK to 10.12.181.177 (00:17:42:c0:e3:ce) via eth0
Sep 19 10:32:49 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:32:57 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:33:13 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:13 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:33:17 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:25 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:33:41 secondary dhcpd: DHCPDISCOVER from 00:19:99:95:41:99 via 10.12.0.1: not responding (recover done)
Sep 19 10:34:46 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:34:51 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:34:59 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:35:16 secondary dhcpd: DHCPDISCOVER from 00:1a:4b:45:3a:2f via 10.12.0.1: peer holds all free leases
Sep 19 10:38:28 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
Sep 19 10:38:32 secondary dhcpd: DHCPDISCOVER from 00:16:d3:e5:3a:3c via 10.12.0.1: not responding (recover done)
我似乎没有负载平衡日志行,所以我认为租约平衡没有发生......
Sent update request all message to foo
Update request all from foo: sending update
平衡过程似乎停留在上面两行
如果我关闭一台服务器上的 DHCPD 守护进程,即使对等服务器检测到另一台对等服务器已关闭,它似乎也不会接管
我该如何解决这个问题?
提前谢谢你(抱歉我的英语不好):-)
答案1
该消息not responding (recovering)
表明服务器没有响应,因为它正在从故障转移(或初始启动)中恢复。并且可能仍在使用池中的所有免费租约填充租约数据库,如果您有一个大型池,这可能需要一段时间。
尝试使用较小的池来验证您的故障转移是否正常工作,然后重新调整。您的范围非常大,这可能是它似乎在更新时挂起的原因。
答案2
我以前遇到过这个问题。对我来说,这是因为防火墙阻止了两台服务器上的端口 647/tcp。我在每台服务器上运行了以下命令,解决了该问题。
firewall-cmd --add-port=647/tcp --permanent
firewall-cmd --reload
然后重新启动 dhcpd 服务。
答案3
错误消息peer holds all free leases
也可能意味着请求是在错误的网络接口上接收的,例如,如果计算机仅配置为在 上获取 IP eth0
,但 DHCP 请求是在 上接收的eth1
。deny dynamic bootp clients
这是此类设置的典型情况。在我的情况下,一个接口用于工作站网络,另一个接口仅用于打印机,并且有人将工作站插入打印机网络。
看我在博客上发布了我遇到该错误消息但未找到明显原因的记录在 Debian 上也是如此。
我不记得not responding (recovering)
当时是否看到过该消息,但我peer holds all free leases
在两个 DHCP 服务器上也看到过该消息。