当 Firewalld 运行时,Keepalived 会出现脑裂现象

当 Firewalld 运行时,Keepalived 会出现脑裂现象

我使用 keepalived 来提供两个 Alma 8 Nginx 服务器(如果有任何相关性,则托管在 VMWare 上)之间的可用性。启用防火墙后,尽管为 VRRP 设置了丰富的规则,但当我启动防火墙时,两个主机都会开始在虚拟 IP 上响应:

root@dca-nfs01:~# arping 172.31.5.233
60 bytes from 00:50:56:84:ac:d0 (172.31.5.233): index=39 time=1.960 usec
60 bytes from 00:50:56:84:ac:d0 (172.31.5.233): index=40 time=20.660 usec
60 bytes from 00:50:56:84:52:ed (172.31.5.233): index=41 time=24.930 usec
60 bytes from 00:50:56:84:ac:d0 (172.31.5.233): index=42 time=534.616 msec
60 bytes from 00:50:56:84:52:ed (172.31.5.233): index=43 time=534.646 msec

我的 keepalived 配置取自标准教程模板,如下所示:

[root@dca-ngx01-al ~]# cat /etc/keepalived/keepalived.conf
global_defs {
  # Keepalived process identifier
  router_id nginx
}

# Script to check whether Nginx is running or not
vrrp_script check_nginx {
  script "/sbin/pidof nginx"
  interval 2
  weight 50
}

# Virtual interface - The priority specifies the order in which the assigned interface to take over in a failover
vrrp_instance VI_01 {
  state MASTER
  interface ens192
  virtual_router_id 151
  priority 110

  # The virtual ip address shared between the two NGINX Web Server which will float
  virtual_ipaddress {
    172.31.5.233
  }
  track_script {
    check_nginx
  }
  authentication {
    auth_type AH
    auth_pass secret
  }
}

两个盒子都有一个简单的单区域防火墙,并且我添加了一条丰富的规则以允许两个主机之间的 VRRP 通信:

[root@dca-ngx01-al ~]# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens192
  sources:
  services: dhcpv6-client http https ssh
  ports: 10050/tcp
  protocols:
  forward: no
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
        rule protocol value="vrrp" accept

我也入坑net.ipv4.ip_forward = 1/etc/sysctl.conf

当两台机器上的firewalld都停止时,keepalived可以正常运行,但是当启用时,双方似乎都失去了联系,并且只是重复发送免费的ARP数据包:

● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2022-03-25 12:48:25 GMT; 2h 35min ago
  Process: 7140 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
  Process: 12966 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 12967 (keepalived)
    Tasks: 2 (limit: 11406)
   Memory: 1.8M
   CGroup: /system.slice/keepalived.service
           ├─12967 /usr/sbin/keepalived -D
           └─12968 /usr/sbin/keepalived -D

Mar 25 15:08:15 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:15 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:15 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:15 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: (VI_01) Sending/queueing gratuitous ARPs on ens192 for 1>
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233
Mar 25 15:08:18 dca-ngx01-al.REDACTED.local Keepalived_vrrp[12968]: Sending gratuitous ARP on ens192 for 172.31.5.233

然而,通过使用 TCPDump 我可以看到,当防火墙处于活动状态时,来自另一台主机的常规 VRRP 数据包至少会到达网络接口:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
15:25:21.532300 IP dca-ngx02-al.REDACTED.local > vrrp.mcast.net: AH(spi=0xac1f05e5,seq=0x3160): VRRPv2, Advertisement, vrid 151, prio 150, authtype ah, intvl 1s, length 20
15:25:22.532419 IP dca-ngx02-al.REDACTED.local > vrrp.mcast.net: AH(spi=0xac1f05e5,seq=0x3161): VRRPv2, Advertisement, vrid 151, prio 150, authtype ah, intvl 1s, length 20
15:25:23.532476 IP dca-ngx02-al.REDACTED.local > vrrp.mcast.net: AH(spi=0xac1f05e5,seq=0x3162): VRRPv2, Advertisement, vrid 151, prio 150, authtype ah, intvl 1s, length 20
15:25:24.532544 IP dca-ngx02-al.REDACTED.local > vrrp.mcast.net: AH(spi=0xac1f05e5,seq=0x3163): VRRPv2, Advertisement, vrid 151, prio 150, authtype ah, intvl 1s, length 20

有人对我如何进一步解决此问题有任何想法吗?

提前致谢。

答案1

今天早上我弄清楚了问题的原因,希望这能对以后的某人有所帮助。我LogDenied=all在 中启用了/etc/firewalld/firewalld.conf,然后能够使用交换机识别哪些数据包仍被防火墙丢弃--get-log-denied

[root@dca-ngx02-al keepalived]# firewall-cmd --get-log-denied
Mar 28 08:40:04 dca-ngx01-al.REDACTED.local kernel: FINAL_REJECT: IN=ens192 OUT= MAC=01:00:5e:00:00:12:00:50:56:84:ac:d0:08:00 SRC=172.31.5.229 DST=224.0.0.18 LEN=64 TOS=0x00 PREC=0xC0 TTL=255 ID=79 PROTO=AH SPI=0xac1f05e5
Mar 28 08:40:05 dca-ngx01-al.REDACTED.local kernel: FINAL_REJECT: IN=ens192 OUT= MAC=01:00:5e:00:00:12:00:50:56:84:ac:d0:08:00 SRC=172.31.5.229 DST=224.0.0.18 LEN=64 TOS=0x00 PREC=0xC0 TTL=255 ID=80 PROTO=AH SPI=0xac1f05e5

我通过为 AH 多播数据包添加后续防火墙规则解决了该问题。

firewall-cmd --add-rich-rule='rule protocol value="ah" accept' --permanent

相关内容