目标是获得两个安装了 keepalived 的不同 CentOS 7 VM,以使用 VIP 192.168.1.11 执行故障转移,并将 http(工作完成后不久变为 https)流量转发到相应的 http 服务器。
192.168.1.11 vm1 (MASTER) --> fwd http to 192.168.1.71
192.168.1.11 vm2 (BACKUP) --> fwd http to 192.168.1.72
我之前曾使用过此故障转移部分(使用 keepalived),但使用 haproxy(在每个虚拟机上)处理转发。现在我尝试使用 keepalived 进行转发(或者在这种情况下,我认为我尝试使用的模式是直接路由),我在状态输出中收到套接字绑定错误,故障转移不起作用。
这是 vm1 keepalived.conf:
global_defs {
script_user root
}
vrrp_instance VIP01 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass [snip]
}
virtual_ipaddress {
192.168.1.11/24
}
}
virtual_server 192.168.1.11 8080 {
delay_loop 10
protocol TCP
lb_algo rr
lb_kind DR
persistence_timeout 7200
real_server 192.168.1.71 8080 {
weight 1
TCP_CHECK {
connect_timeout 5
connect_port 8080
}
}
}
和vm2:
global_defs {
script_user root
}
vrrp_instance VIP01 {
state BACKUP
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass [snip]
}
virtual_ipaddress {
192.168.1.11/24
}
}
virtual_server 192.168.1.11 80 {
delay_loop 10
protocol TCP
lb_algo rr
lb_kind DR
persistence_timeout 7200
real_server 192.168.1.72 8080 {
weight 1
TCP_CHECK {
connect_timeout 5
connect_port 8080
}
}
}
(两个虚拟机上的)输出systemctl status keepalived
:
...
Jul 20 07:52:16 [hostname] Keepalived_healthcheckers[1738]: TCP socket bind failed. Rescheduling.
Jul 20 07:52:26 [hostname] Keepalived_healthcheckers[1738]: TCP socket bind failed. Rescheduling.
Jul 20 07:52:36 [hostname] Keepalived_healthcheckers[1738]: TCP socket bind failed. Rescheduling.
我还尝试添加以下内容/etc/sysctl.conf
:
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
并通过重启后查询来确认他们已经采取了行动。
我意识到使用列表中一台服务器的循环负载平衡并不是真正的负载平衡,但我只是将其视为一种转发方式,如果有更简洁/更好的方法来做到这一点,我会感兴趣。
编辑:
如果我注释掉 TCP 检查,绑定失败消息似乎消失了。我已通过导航到以下地址检查了目标 IP/端口:http://192.168.1.71:8080在浏览器中它可以按预期工作,但是它无法通过 VIP .11 工作。 无论如何,看起来它应该是 HTTP_GET 检查。
我可以从 vm1 的 cmd 行卷曲页面curl http://192.168.1.71:8080
,所以我知道它可以访问 .71 的 http 服务器。
在浏览器中导航仍然会http://192.168.1.11:8080
导致超时。状态没有显示问题的迹象,将研究更详细的日志选项......
根据这个(第 6 页底部)有可能 keepalived 正在从列表中删除真实服务器。似乎有什么东西阻止 keepalived 服务使用 TCP 检查或 HTTP get 访问真实服务器。也许是 selinux 策略?
/var/log/audit/audit.log
充满了 keepalived 条目......
成立这并尝试设置允许连接任何布尔值,但这并没有改变我的结果。
还尝试使用audit2allow
生成规则然后应用它们,尽管审计日志似乎已停止记录拒绝的消息,但从 11 到 71 的转发仍然不起作用。
仍然没有看到任何错误迹象:
Jul 20 12:46:59 [hostname] Keepalived[1951]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Jul 20 12:46:59 [hostname] Keepalived[1951]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 20 12:46:59 [hostname] Keepalived[1952]: Starting Healthcheck child process, pid=1953
Jul 20 12:46:59 [hostname] Keepalived[1952]: Starting VRRP child process, pid=1954
Jul 20 12:46:59 [hostname] Keepalived_healthcheckers[1953]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 20 12:46:59 [hostname] Keepalived_healthcheckers[1953]: Activating healthchecker for service [192.168.1.11]:8080
Jul 20 12:46:59 [hostname] systemd: Started LVS and VRRP High Availability Monitor.
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Registering Kernel netlink reflector
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Registering Kernel netlink command channel
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Registering gratuitous ARP shared channel
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Truncating auth_pass to 8 characters
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) removing protocol VIPs.
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: Using LinkWatch kernel netlink reflector...
Jul 20 12:46:59 [hostname] Keepalived_vrrp[1954]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
Jul 20 12:47:00 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) Transition to MASTER STATE
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) Entering MASTER STATE
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) setting protocol VIPs.
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) Sending/queueing gratuitous ARPs on eth0 for 192.168.1.11
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:01 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: VRRP_Instance(VIP01) Sending/queueing gratuitous ARPs on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
Jul 20 12:47:06 [hostname] Keepalived_vrrp[1954]: Sending gratuitous ARP on eth0 for 192.168.1.11
还值得一提的是,我之前已禁用防火墙以排除它们......
对 192.168.1.11 执行 ping 操作并将网络连接拉至 vm1 会导致预期的故障转移。所以问题实际上出在我的虚拟/真实服务器设置上......