我在 CentOS7 上使用 keepalived v2.0.19,其中有一个 vrrp 实例跟踪 haproxy 进程的存在。不幸的是,在 haproxy 进程重新启动后,vrrp 实例从未离开 FAULT 状态
这是我的配置
vrrp_track_process chk_service {
process haproxy
weight 0
}
vrrp_instance VI_1 {
interface eth0
state MASTER
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.100 dev eth0 label eth0:shared
}
track_process {
chk_service
}
}
syslogs 日志显示,当 haproxy 进程关闭时,仲裁丢失,但是当 haproxy 进程几秒钟后重新上线时,仲裁从未获得。
systemd: Stopping HAProxy Load Balancer...
haproxy: [WARNING] 330/081104 (72258) : Exiting Master process...
haproxy: [ALERT] 330/081104 (72258) : Current program 'dataplane-api' (72260) exited with code 0 (Exit)
haproxy: [ALERT] 330/081104 (72258) : Current worker #1 (72261) exited with code 143 (Terminated)
haproxy: [WARNING] 330/081104 (72258) : All workers exited. Exiting... (0)
systemd: Stopped HAProxy Load Balancer.
Keepalived_vrrp[72335]: Quorum lost for tracked process chk_service
Keepalived_vrrp[72335]: (VI_1) Entering FAULT STATE
Keepalived_vrrp[72335]: (VI_1) sent 0 priority
Keepalived_vrrp[72335]: (VI_1) removing VIPs.
systemd: Starting HAProxy Load Balancer...
haproxy[113178]: Proxy stats started.
haproxy[113178]: Proxy main started.
haproxy[113178]: Proxy app started.
haproxy: [NOTICE] 330/081112 (113178) : New program 'dataplane-api' (113179) forked
haproxy: [NOTICE] 330/081112 (113178) : New worker #1 (113180) forked
systemd: Started HAProxy Load Balancer.
请注意,当我启动 keepalived 进程时,可以正确检测到 haproxy 进程的存在
以下是 keepalived -v 的输出
Keepalived v2.0.19 (unknown)
Copyright(C) 2001-2019 Alexandre Cassen, <[email protected]>
Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019
configure options: --prefix=/opt/keepalived
Config options: LIBIPTC LIBIPSET_DYNAMIC LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING
System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK FRA_OIFNAME IFA_FLAGS IP_MULTICAST_ALL LIBIPTC NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK VRRP_VMAC IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE SO_MARK SCHED_RT SCHED_RESET_ON_FORK
我尝试设置法定人数的最小值和最大值,但没有成功。
有人遇到过同样的问题吗?
答案1
keepalived 2.0.19 版本也遇到同样的问题。
在我们的案例中,问题是对于 pid 大于 32767 的进程,keepalived 尝试打开文件:/proc/xxxxx/comm,其中 xxxx 为负数。因此,如果计算机运行时间较长,并且 pid 变得很大,您可以试验这种行为。
幸运的是,keepalived 2.0.20 修复了这个错误,如下所述:
- 修复 PID > 32767 的 track_process
https://www.keepalived.org/changelog.html(版本 2.0.20)