我们使用 Keepalived(在 RHEL 7 上运行)来管理 3 个 HAProxy 服务器上的共享 IP 地址。每台服务器都有 2 个接口,一个具有公共 IP,一个具有私有 IP。我们正在从一对 Kemp LoadMaster LM-3000 设备迁移。
我们注意到,三个 HAProxy 系统中有两个每秒都会记录很多行bogus VRRP packet received on em2 !!!
。
这是 1 秒的日志。我删去了时间和进程号以节省空间。
haproxy01 Keepalived_vrrp: VRRP_Instance(haproxy::fqdn) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy01 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy01 Keepalived_vrrp: VRRP_Instance(haproxy::fqdn) ignoring received advertisment...
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::device) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy00 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::device) ignoring received advertisment...
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::support) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy00 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::support) ignoring received advertisment...
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::whiffle) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy00 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::whiffle) ignoring received advertisment...
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::www) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy00 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::www) ignoring received advertisment...
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::wwwdev) IPSEC-AH : invalid IPSEC HMAC-MD5 value. Due to fields mutation or bad password !
haproxy00 Keepalived_vrrp: bogus VRRP packet received on em2 !!!
haproxy00 Keepalived_vrrp: VRRP_Instance(haproxy::wwwdev) ignoring received advertisment...
haproxy02
没有记录任何奇怪的流量。
haproxy00.example.com
:em1 -> 172.24.0.200
,em2 -> 192.0.2.32
haproxy01.example.com
:em2 -> 172.24.0.201
,em1 -> 192.0.2.29
haproxy02.example.com
:em1 -> 172.24.0.202
,em2 -> 192.0.2.24
kemp00.example.com
:em1 -> 172.24.0.48
,em2 -> 192.0.2.59
kemp01.example.com
:em1 -> 172.24.0.49
,em2 -> 192.0.2.60
- “漂浮的”
kemp.example.com
:172.24.0.50
- “漂浮的”
kemp-public.example.com
:192.0.2.63
请注意,与其他两个相比,haproxy01
具有em1
和反转。em2
系统haproxy*
设置为使用单播 VRRP 而不是多播(配置示例与haproxy00
其他的完全相同,只是接口名称更改了haproxy01
并且优先级不同。haproxy02
是MASTER
):
vrrp_instance haproxy::fqdn {
interface em1
state BACKUP
virtual_router_id 199
priority 100
advert_int 1
garp_master_delay 5
authentication {
auth_type AH
auth_pass csvrp199
}
virtual_ipaddress {
172.24.0.199/24 dev em1
}
virtual_routes {
metric 5 to default via 172.24.0.1
}
unicast_src_ip 172.24.0.200
unicast_peer {
172.24.0.201
172.24.0.202
}
}
vrrp_instance haproxy::device {
interface em2
state BACKUP
virtual_router_id 15
priority 100
advert_int 1
garp_master_delay 5
authentication {
auth_type AH
auth_pass csvrrp15
}
virtual_ipaddress {
192.0.2.7/26 dev em2
}
virtual_routes {
metric 5 to default via 192.0.2.1
}
unicast_src_ip 192.0.2.32
unicast_peer {
192.0.2.24
192.0.2.29
}
}
vrrp_instance haproxy-csweb:haproxy::support { ... }
vrrp_instance haproxy-csweb:haproxy::whiffle { ... }
vrrp_instance haproxy-csweb:haproxy::www { ... }
vrrp_instance haproxy-csweb:haproxy::wwwdev { ... }
我们知道 VRRP 正在系统之间运行,haproxy
因为我们可以关闭系统haproxy02
并将流量转移到下一个最高优先级的系统。
tcpdump
显示我们认为导致此问题的流量。addrs
从一个地址LM
到另一个地址的广告似乎被某种方式混淆了,因为它们都不是我们的真正地址。
[root@haproxy00 ~]# tcpdump -vvvvvni em2 vrrp
tcpdump: listening on em2, link-type EN10MB (Ethernet), capture size 262144 bytes
13:54:09.672726 IP (tos 0x10, ttl 255, id 51772, offset 0, flags [DF], proto VRRP (112), length 56)
192.0.2.59 > 224.0.0.18: vrrp 192.0.2.59 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 1, authtype none, intvl 1s, length 36, addrs(7): 127.73.197.82,124.192.126.111,231.25.226.215,113.220.143.181,197.101.63.203,152.246.226.65,46.55.62.80
[root@haproxy01 ~]# tcpdump -vvvvvni em2 vrrp
tcpdump: listening on em2, link-type EN10MB (Ethernet), capture size 262144 bytes
13:54:36.739262 IP (tos 0x10, ttl 255, id 50547, offset 0, flags [DF], proto VRRP (112), length 56)
172.24.0.48 > 224.0.0.18: vrrp 172.24.0.48 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 1, authtype none, intvl 1s, length 36, addrs(7): 120.18.52.8,8.96.198.173,44.237.204.205,99.139.163.15,10.76.116.67,163.0.175.114,121.54.183.104
em2
因此,每个服务器上都会记录错误vrrp_instance
,但不会记录实例的错误em1
,也不会记录任何错误haproxy02
。
我们正在尝试阻止记录这些错误,因为它们隐藏了更重要的错误并使跟踪日志文件变得不可能。
另外,iptables
我认为我们还有应该阻止多播流量的方法,但似乎并没有:
[root@haproxy00 ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 /* 000 accept all icmp */
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* 001 accept all to lo interface */
REJECT all -- 0.0.0.0/0 127.0.0.0/8 /* 002 reject local traffic not on loopback interface */ reject-with icmp-port-unreachable
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* 003 accept related established rules */ state RELATED,ESTABLISHED
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 multiport dports 80,443
ACCEPT tcp -- 172.16.0.0/12 0.0.0.0/0 multiport dports 22 /* 203 allow internal sshd:22 */ state NEW
ACCEPT 112 -- 192.0.2.24 0.0.0.0/0 /* 226 Allow vrrp from 192.0.2.24 */
ACCEPT 112 -- 192.0.2.29 0.0.0.0/0 /* 226 Allow vrrp from 192.0.2.29 */
ACCEPT 112 -- 172.24.0.201 0.0.0.0/0 /* 226 Allow vrrp from 172.24.0.201 */
ACCEPT 112 -- 172.24.0.202 0.0.0.0/0 /* 226 Allow vrrp from 172.24.0.202 */
DROP all -- 0.0.0.0/0 0.0.0.0/0 /* 999 drop all */
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination