尝试设置 HA 堡垒服务器。不需要故障转移和负载平衡。两台运行 debian 的服务器。bastion01 和 bastion02。192.168.0.10 和 192.168.0.11。浮动 IP 为 192.168.0.12。
我从这些配置开始:
堡垒01:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
堡垒02:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
这绝对有效。确认当任一服务器关闭时,浮动 IP 将故障转移。
但是,它不能处理 ssh 停止但服务器本身仍在运行的情况。
为此,我需要添加 TCP 检查。
keepalived 的文档似乎提供了一个示例:
http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html
然而,他们的例子涉及负载平衡,这只是增加了另一层我不感兴趣的复杂性。
看起来有问题的块是:
TCP_CHECK { 连接超时 3 连接端口 22 }
我尝试用我最好的猜测来了解如何配置它:
堡垒01:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
堡垒02:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
但这不起作用,它不理解 real_server 块。好吧,也许我不能只使用故障转移,也许 tcp 检查是 keepalived 的 lb 组件的一部分,所以我必须在这里使用负载平衡。这很好,不会有什么坏处。所以...配置现在变成(直接从http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html):
堡垒01:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
virtual_server 192.168.1.11 22 {
delay_loop 6
lb_algo rr
lb_kind NAT
nat_mask 255.255.255.0
protocol TCP
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
}
堡垒02:
global_defs {
notification_email {
[email protected]
}
notification_email_from [email protected]
smtp_server localhost
smtp_connect_timeout 30
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.0.12
}
}
virtual_server 192.168.1.11 22 {
delay_loop 6
lb_algo rr
lb_kind NAT
nat_mask 255.255.255.0
protocol TCP
real_server 192.168.0.10 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
real_server 192.168.0.11 22 {
weight 1
TCP_CHECK {
connect_timeout 3
connect_port 22
}
}
}
这根本不起作用。
当我在 bastion01 上停止 ssh 并尝试 ssh 到浮动 ip 时,连接被拒绝,ip 不会故障转移到 bastion02。
在bastion01的日志中:
bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry.
bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22
当 TCP 健康检查失败时,如何说服 keepalived 真正地对浮动 IP 进行故障转移?
答案1
如果您不需要负载平衡,跟踪脚本会根据针对您的服务运行的检查提供故障转移。
首先,添加一个vrrp_script
块前你的vrrp_instance
:
global_defs {
enable_script_security
}
vrrp_script chk_sshd {
script "/usr/bin/pgrep sshd" # or "nc -zv localhost 22"
interval 5 # default: 1s
}
接下来,添加一个track_script
到您的vrrp_instance
引用vrrp_script
:
vrrp_instance VI_1 {
... other stuff ...
track_script {
chk_sshd
}
}
虽然没有严格要求,但enable_script_security
可执行文件的 FQDN 可以在一定程度上保证不会出现恶意活动,并会抑制日志中的警告。请参阅Keepalived 手册页了解更多信息。