故障转移起搏器集群、数据包丢失监视器

2024-5-30 • tag-icon

本期内容继续这。

因此，我在一个 VLAN 中有两个测试服务器。

srv1
  eth1 10.10.10.11
  eth2 10.20.10.11

srv2
  eth1 10.10.10.12
  eth2 10.20.10.12

Cluster VIP - 10.10.10.100

具有两个接口的 Corosync 配置：

  rrp_mode: passive

  interface {
    ringnumber: 0
    bindnetaddr: 10.10.10.0
    mcastaddr: 226.94.1.1
    mcastport: 5405
  }

  interface {
    ringnumber: 1
    bindnetaddr: 10.20.10.0
    mcastaddr: 226.94.1.1
    mcastport: 5407
  }

起搏器配置：

# crm configure show
node srv1
node srv2
primitive P_INTRANET ocf:pacemaker:ping \
  params host_list="10.10.10.11 10.10.10.12" multiplier="100" name="ping_intranet" \
  op monitor interval="5s" timeout="5s"
primitive cluster-ip ocf:heartbeat:IPaddr2 \
  params ip="10.10.10.100" cidr_netmask="24" \
  op monitor interval="5s"
primitive ha-nginx lsb:nginx \
  op monitor interval="5s"
clone CL_INTRANET P_INTRANET \
  meta globally-unique="false"
location L_CLUSTER_IP_PING_INTRANET cluster-ip \
  rule $id="L_CLUSTER_IP_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_HA_NGINX_PING_INTRANET ha-nginx \
  rule $id="L_HA_NGINX_PING_INTRANET-rule" ping_intranet: defined ping_intranet
location L_INTRANET_01 CL_INTRANET 100: srv1
location L_INTRANET_02 CL_INTRANET 100: srv2
colocation nginx-and-cluster-ip 1000: ha-nginx cluster-ip
property $id="cib-bootstrap-options" \
  dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
  cluster-infrastructure="openais" \
  expected-quorum-votes="2" \
  no-quorum-policy="ignore" \
  stonith-enabled="false"

现在，我模拟 eth1 上的数据包丢失：

# tc qdisc add dev eth1 root netem delay 200ms 500ms 75% loss 42% 75% duplicate 1% corrupt 0.1% reorder 25% 50%

此后，起搏器每分钟更换活动节点几次，但一切仍不正常。

当数据包丢失率约为 15% 时，我如何监控数据包丢失并切换节点？

有解决方案吗？还是我需要自己编写新的资源代理来监控它？

相关内容