如何使用 keepalived 设置 tcp 检查？

2024-5-31 • tag-icon

尝试设置 HA 堡垒服务器。不需要故障转移和负载平衡。两台运行 debian 的服务器。bastion01 和 bastion02。192.168.0.10 和 192.168.0.11。浮动 IP 为 192.168.0.12。

我从这些配置开始：

堡垒01：

global_defs {
   notification_email {
    [email protected]
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

堡垒02：

global_defs {
   notification_email {
     [email protected] 
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

这绝对有效。确认当任一服务器关闭时，浮动 IP 将故障转移。

但是，它不能处理 ssh 停止但服务器本身仍在运行的情况。

为此，我需要添加 TCP 检查。

keepalived 的文档似乎提供了一个示例：

http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html

然而，他们的例子涉及负载平衡，这只是增加了另一层我不感兴趣的复杂性。

看起来有问题的块是：

TCP_CHECK { 连接超时 3 连接端口 22 }

我尝试用我最好的猜测来了解如何配置它：

堡垒01：

global_defs {
   notification_email {
     [email protected] 
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

real_server 192.168.0.10 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }   
} 

real_server 192.168.0.11 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }
}

堡垒02：

global_defs {
   notification_email {
     [email protected] 
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   
}

real_server 192.168.0.10 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }   
} 

real_server 192.168.0.11 22 {
    weight 1
    TCP_CHECK {
        connect_timeout 3
        connect_port 22
    }
}

但这不起作用，它不理解 real_server 块。好吧，也许我不能只使用故障转移，也许 tcp 检查是 keepalived 的 lb 组件的一部分，所以我必须在这里使用负载平衡。这很好，不会有什么坏处。所以...配置现在变成（直接从http://www.keepalived.org/LVS-NAT-Keepalived-HOWTO.html)：

堡垒01：

global_defs {
   notification_email {
    [email protected]
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 101 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   

}

virtual_server 192.168.1.11 22 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT 
    nat_mask 255.255.255.0

    protocol TCP 

    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   

    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   
}

堡垒02：

global_defs {
   notification_email {
    [email protected]
   }   
   notification_email_from [email protected]
   smtp_server localhost
   smtp_connect_timeout 30
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 101 
    priority 100 
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }   
    virtual_ipaddress {
        192.168.0.12
    }   

}

virtual_server 192.168.1.11 22 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT 
    nat_mask 255.255.255.0

    protocol TCP 

    real_server 192.168.0.10 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   

    real_server 192.168.0.11 22 {
        weight 1
        TCP_CHECK {
            connect_timeout 3
            connect_port 22
        }
    }   
}

这根本不起作用。

当我在 bastion01 上停止 ssh 并尝试 ssh 到浮动 ip 时，连接被拒绝，ip 不会故障转移到 bastion02。

在bastion01的日志中：

bastion01 Keepalived_healthcheckers[11613]: Check on service [192.168.0.10]:22 failed after 1 retry.
bastion01 Keepalived_healthcheckers[11613]: Removing service [192.168.0.10]:22 from VS [192.168.1.11]:22

当 TCP 健康检查失败时，如何说服 keepalived 真正地对浮动 IP 进行故障转移？

答案1

如果您不需要负载平衡，跟踪脚本会根据针对您的服务运行的检查提供故障转移。

首先，添加一个vrrp_script块前你的vrrp_instance：

global_defs {
    enable_script_security
}

vrrp_script chk_sshd {
    script "/usr/bin/pgrep sshd" # or "nc -zv localhost 22"
    interval 5                   # default: 1s
}

接下来，添加一个track_script到您的vrrp_instance引用vrrp_script：

 vrrp_instance VI_1 {
    ... other stuff ...

    track_script {
        chk_sshd
    }
}

虽然没有严格要求，但enable_script_security可执行文件的 FQDN 可以在一定程度上保证不会出现恶意活动，并会抑制日志中的警告。请参阅Keepalived 手册页了解更多信息。

答案1

相关内容