Corosync-Pacemaker 没有裂脑

Corosync-Pacemaker 没有裂脑

我正在尝试使用 CentOS 7、Corosync、Pacemaker 和 pcsd 设置一个由两个节点组成的集群。我可以手动将资源从一个节点迁移到另一个节点,但如果我关闭主节点(通过拔掉电源线),辅助节点不会成为主节点。我有 2 个网络接口。eno1 10.211.0.0/24 用于默认路由和 VRRP,eno2 10.255.255.0/30 用于 Corosync 和 Pacemaker。

以下是配置:

pcs config show
Cluster Name: PBX
Corosync Nodes:
 pbx-1no pbx-2no
Pacemaker Nodes:
 pbx-1no pbx-2no

Resources:
 Master: PBX_DRBD_master
  Meta Attrs: clone-max=2 clone-node-max=1 master-max=1 master-node-max=1 notify=true
  Resource: PBX_DRBD (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=asterisk_DRBD
   Operations: demote interval=0s timeout=90 (PBX_DRBD-demote-interval-0s)
               monitor interval=10s on-fail=restart role=Master timeout=20s (PBX_DRBD-monitor-interval-10s)
               monitor interval=20s on-fail=restart role=Slave timeout=20s (PBX_DRBD-monitor-interval-20s)
               notify interval=0s timeout=90 (PBX_DRBD-notify-interval-0s)
               promote interval=0s timeout=90 (PBX_DRBD-promote-interval-0s)
               reload interval=0s timeout=30 (PBX_DRBD-reload-interval-0s)
               start interval=0s on-fail=restart timeout=240s (PBX_DRBD-start-interval-0s)
               stop interval=0s on-fail=block timeout=100s (PBX_DRBD-stop-interval-0s)
 Resource: PBX_FS (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/mnt/drbd0 fstype=ext4
  Operations: monitor interval=20s on-fail=restart timeout=40s (PBX_FS-monitor-interval-20s)
              notify interval=0s timeout=60s (PBX_FS-notify-interval-0s)
              start interval=0s on-fail=restart timeout=60s (PBX_FS-start-interval-0s)
              stop interval=0s on-fail=block timeout=60s (PBX_FS-stop-interval-0s)
 Resource: PBX_IP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=24 iflabel=0 ip=10.211.0.10 nic=eno1
  Operations: monitor interval=10s on-fail=restart timeout=20s (PBX_IP-monitor-interval-10s)
              start interval=0s on-fail=restart timeout=20s (PBX_IP-start-interval-0s)
              stop interval=0s on-fail=block timeout=20s (PBX_IP-stop-interval-0s)
 Resource: PBX_ROUTE_default (class=ocf provider=heartbeat type=Route)
  Attributes: destination=0.0.0.0/0 family=ip4 gateway=10.211.0.1 source=10.211.0.10
  Operations: monitor interval=10s on-fail=restart timeout=20s (PBX_ROUTE_default-monitor-interval-10s)
              reload interval=0s timeout=20s (PBX_ROUTE_default-reload-interval-0s)
              start interval=0s on-fail=restart timeout=20s (PBX_ROUTE_default-start-interval-0s)
              stop interval=0s on-fail=ignore timeout=20s (PBX_ROUTE_default-stop-interval-0s)
 Resource: PBX_mariadb (class=systemd type=mariadb.service)
  Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_mariadb-monitor-interval-100s)
              start interval=0s on-fail=ignore timeout=100s (PBX_mariadb-start-interval-0s)
              stop interval=0s on-fail=ignore timeout=100s (PBX_mariadb-stop-interval-0s)
 Resource: PBX_httpd (class=systemd type=httpd.service)
  Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_httpd-monitor-interval-100s)
              start interval=0s on-fail=ignore timeout=100s (PBX_httpd-start-interval-0s)
              stop interval=0s on-fail=ignore timeout=100s (PBX_httpd-stop-interval-0s)
 Resource: PBX_asterisk (class=systemd type=asterisk.service)
  Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_asterisk-monitor-interval-100s)
              start interval=0s on-fail=ignore timeout=100s (PBX_asterisk-start-interval-0s)
              stop interval=0s on-fail=ignore timeout=100s (PBX_asterisk-stop-interval-0s)
 Clone: ping_internal-clone
  Resource: ping_internal (class=ocf provider=pacemaker type=ping)
   Attributes: dampen=5s host_list="10.255.255.1 10.255.255.2" multiplier=1000
   Operations: monitor interval=10 timeout=60 (ping_internal-monitor-interval-10)
               start interval=0s timeout=60 (ping_internal-start-interval-0s)
               stop interval=0s timeout=20 (ping_internal-stop-interval-0s)

Stonith Devices:
 Resource: hpilo1 (class=stonith type=fence_ilo5)
  Attributes: ipaddr=ilo1.emergency login=admin passwd=11111 pcmk_host_list=pbx-1no
  Operations: monitor interval=60s (hpilo1-monitor-interval-60s)
 Resource: hpilo2 (class=stonith type=fence_ilo5)
  Attributes: ipaddr=ilo2.emergency login=admin passwd=11111 pcmk_host_list=pbx-2no
  Operations: monitor interval=60s (hpilo2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: PBX_FS
    Enabled on: pbx-1no (score:INFINITY) (role: Started) (id:cli-prefer-PBX_FS)
  Resource: hpilo1
    Disabled on: pbx-1no (score:-INFINITY) (id:location-hpilo1-pbx-1no--INFINITY)
  Resource: hpilo2
    Disabled on: pbx-2no (score:-INFINITY) (id:location-hpilo2-pbx-2no--INFINITY)
Ordering Constraints:
  promote PBX_DRBD_master then start PBX_FS (kind:Mandatory) (id:order-PBX_DRBD_master-PBX_FS-mandatory)
  start PBX_FS then start PBX_IP (kind:Mandatory) (id:order-PBX_FS-PBX_IP-mandatory)
  start PBX_IP then start PBX_ROUTE_default (kind:Mandatory) (id:order-PBX_IP-PBX_ROUTE_default-mandatory)
  start PBX_FS then start PBX_asterisk (kind:Mandatory) (id:order-PBX_FS-PBX_asterisk-mandatory)
  start PBX_FS then start PBX_mariadb (kind:Mandatory) (id:order-PBX_FS-PBX_mariadb-mandatory)
  start PBX_mariadb then start PBX_httpd (kind:Mandatory) (id:order-PBX_mariadb-PBX_httpd-mandatory)
Colocation Constraints:
  PBX_ROUTE_default with PBX_IP (score:INFINITY) (id:colocation-PBX_ROUTE_default-PBX_IP-INFINITY)
  PBX_FS with PBX_DRBD_master (score:INFINITY) (with-rsc-role:Master) (id:colocation-PBX_FS-PBX_DRBD_master-INFINITY)
  PBX_IP with PBX_FS (score:INFINITY) (id:colocation-PBX_IP-PBX_FS-INFINITY)
  PBX_asterisk with PBX_FS (score:INFINITY) (id:colocation-PBX_asterisk-PBX_FS-INFINITY)
  PBX_mariadb with PBX_FS (score:INFINITY) (id:colocation-PBX_mariadb-PBX_FS-INFINITY)
  PBX_httpd with PBX_FS (score:INFINITY) (id:colocation-PBX_httpd-PBX_FS-INFINITY)
Ticket Constraints:

Alerts:
 Alert: smtp_alert (path=/var/lib/pacemaker/alert_smtp.sh)
  Recipients:
   Recipient: smtp_alert-recipient (value=hidden)

Resources Defaults:
 resource-stickiness=100
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: PBX
 dc-version: 1.1.23-1.el7_9.1-9acf116022
 have-watchdog: false
 last-lrm-refresh: 1613632161
 no-quorum-policy: ignore
 stonith-enabled: true

Quorum:
  Options:

Corosync配置文件

    totem {

    version: 2

    cluster_name: PBX

    secauth: on

    transport: udpu

    token: 5000

    }

    nodelist {

    node {

    ring0_addr: pbx-1no

    nodeid: 1

    }

星号.DRBD

resource asterisk_DRBD {

handlers {

split-brain "/usr/lib/drbd/notify-split-brain.sh root";

}

disk {

on-io-error detach;

}

net {

protocol C;

after-sb-0pri discard-zero-changes;

after-sb-1pri discard-secondary;

after-sb-2pri call-pri-lost-after-sb;

cram-hmac-alg "sha1";

shared-secret "something";

}

on pbx-1 {

device /dev/drbd0;

disk /dev/md3;

address 10.255.255.1:7789;

meta-disk internal;

}

on pbx-2 {

device /dev/drbd0;

disk /dev/md3;

address 10.255.255.2:7789;

meta-disk internal;

}

}

    node {

    ring0_addr: pbx-2no

    nodeid: 2

    }

    }

    quorum {

    provider: corosync_votequorum

    two_node: 1

    }

    logging {

    to_logfile: yes

    logfile: /var/log/cluster/corosync.log

    to_syslog: yes

    }

一开始我考虑的是路由,因为当 eno2 宕机时,10.255.255.0/30 没有路由,而是通过默认网关。但我在路由器上制定了一条规则,该规则会丢弃这些数据包,并且没有任何效果。问题可能出在哪里?

答案1

问题出在 IP 地址上。当主节点关闭时,辅助节点上的以太网链路也会关闭,并且没有 IP。所以我编写了一个脚本,如果接口上没有 IP,它会执行 ifdown/ifup

相关内容