我正在尝试使用 CentOS 7、Corosync、Pacemaker 和 pcsd 设置一个由两个节点组成的集群。我可以手动将资源从一个节点迁移到另一个节点,但如果我关闭主节点(通过拔掉电源线),辅助节点不会成为主节点。我有 2 个网络接口。eno1 10.211.0.0/24 用于默认路由和 VRRP,eno2 10.255.255.0/30 用于 Corosync 和 Pacemaker。
以下是配置:
pcs config show
Cluster Name: PBX
Corosync Nodes:
pbx-1no pbx-2no
Pacemaker Nodes:
pbx-1no pbx-2no
Resources:
Master: PBX_DRBD_master
Meta Attrs: clone-max=2 clone-node-max=1 master-max=1 master-node-max=1 notify=true
Resource: PBX_DRBD (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=asterisk_DRBD
Operations: demote interval=0s timeout=90 (PBX_DRBD-demote-interval-0s)
monitor interval=10s on-fail=restart role=Master timeout=20s (PBX_DRBD-monitor-interval-10s)
monitor interval=20s on-fail=restart role=Slave timeout=20s (PBX_DRBD-monitor-interval-20s)
notify interval=0s timeout=90 (PBX_DRBD-notify-interval-0s)
promote interval=0s timeout=90 (PBX_DRBD-promote-interval-0s)
reload interval=0s timeout=30 (PBX_DRBD-reload-interval-0s)
start interval=0s on-fail=restart timeout=240s (PBX_DRBD-start-interval-0s)
stop interval=0s on-fail=block timeout=100s (PBX_DRBD-stop-interval-0s)
Resource: PBX_FS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/mnt/drbd0 fstype=ext4
Operations: monitor interval=20s on-fail=restart timeout=40s (PBX_FS-monitor-interval-20s)
notify interval=0s timeout=60s (PBX_FS-notify-interval-0s)
start interval=0s on-fail=restart timeout=60s (PBX_FS-start-interval-0s)
stop interval=0s on-fail=block timeout=60s (PBX_FS-stop-interval-0s)
Resource: PBX_IP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 iflabel=0 ip=10.211.0.10 nic=eno1
Operations: monitor interval=10s on-fail=restart timeout=20s (PBX_IP-monitor-interval-10s)
start interval=0s on-fail=restart timeout=20s (PBX_IP-start-interval-0s)
stop interval=0s on-fail=block timeout=20s (PBX_IP-stop-interval-0s)
Resource: PBX_ROUTE_default (class=ocf provider=heartbeat type=Route)
Attributes: destination=0.0.0.0/0 family=ip4 gateway=10.211.0.1 source=10.211.0.10
Operations: monitor interval=10s on-fail=restart timeout=20s (PBX_ROUTE_default-monitor-interval-10s)
reload interval=0s timeout=20s (PBX_ROUTE_default-reload-interval-0s)
start interval=0s on-fail=restart timeout=20s (PBX_ROUTE_default-start-interval-0s)
stop interval=0s on-fail=ignore timeout=20s (PBX_ROUTE_default-stop-interval-0s)
Resource: PBX_mariadb (class=systemd type=mariadb.service)
Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_mariadb-monitor-interval-100s)
start interval=0s on-fail=ignore timeout=100s (PBX_mariadb-start-interval-0s)
stop interval=0s on-fail=ignore timeout=100s (PBX_mariadb-stop-interval-0s)
Resource: PBX_httpd (class=systemd type=httpd.service)
Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_httpd-monitor-interval-100s)
start interval=0s on-fail=ignore timeout=100s (PBX_httpd-start-interval-0s)
stop interval=0s on-fail=ignore timeout=100s (PBX_httpd-stop-interval-0s)
Resource: PBX_asterisk (class=systemd type=asterisk.service)
Operations: monitor interval=100s on-fail=ignore timeout=60s (PBX_asterisk-monitor-interval-100s)
start interval=0s on-fail=ignore timeout=100s (PBX_asterisk-start-interval-0s)
stop interval=0s on-fail=ignore timeout=100s (PBX_asterisk-stop-interval-0s)
Clone: ping_internal-clone
Resource: ping_internal (class=ocf provider=pacemaker type=ping)
Attributes: dampen=5s host_list="10.255.255.1 10.255.255.2" multiplier=1000
Operations: monitor interval=10 timeout=60 (ping_internal-monitor-interval-10)
start interval=0s timeout=60 (ping_internal-start-interval-0s)
stop interval=0s timeout=20 (ping_internal-stop-interval-0s)
Stonith Devices:
Resource: hpilo1 (class=stonith type=fence_ilo5)
Attributes: ipaddr=ilo1.emergency login=admin passwd=11111 pcmk_host_list=pbx-1no
Operations: monitor interval=60s (hpilo1-monitor-interval-60s)
Resource: hpilo2 (class=stonith type=fence_ilo5)
Attributes: ipaddr=ilo2.emergency login=admin passwd=11111 pcmk_host_list=pbx-2no
Operations: monitor interval=60s (hpilo2-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: PBX_FS
Enabled on: pbx-1no (score:INFINITY) (role: Started) (id:cli-prefer-PBX_FS)
Resource: hpilo1
Disabled on: pbx-1no (score:-INFINITY) (id:location-hpilo1-pbx-1no--INFINITY)
Resource: hpilo2
Disabled on: pbx-2no (score:-INFINITY) (id:location-hpilo2-pbx-2no--INFINITY)
Ordering Constraints:
promote PBX_DRBD_master then start PBX_FS (kind:Mandatory) (id:order-PBX_DRBD_master-PBX_FS-mandatory)
start PBX_FS then start PBX_IP (kind:Mandatory) (id:order-PBX_FS-PBX_IP-mandatory)
start PBX_IP then start PBX_ROUTE_default (kind:Mandatory) (id:order-PBX_IP-PBX_ROUTE_default-mandatory)
start PBX_FS then start PBX_asterisk (kind:Mandatory) (id:order-PBX_FS-PBX_asterisk-mandatory)
start PBX_FS then start PBX_mariadb (kind:Mandatory) (id:order-PBX_FS-PBX_mariadb-mandatory)
start PBX_mariadb then start PBX_httpd (kind:Mandatory) (id:order-PBX_mariadb-PBX_httpd-mandatory)
Colocation Constraints:
PBX_ROUTE_default with PBX_IP (score:INFINITY) (id:colocation-PBX_ROUTE_default-PBX_IP-INFINITY)
PBX_FS with PBX_DRBD_master (score:INFINITY) (with-rsc-role:Master) (id:colocation-PBX_FS-PBX_DRBD_master-INFINITY)
PBX_IP with PBX_FS (score:INFINITY) (id:colocation-PBX_IP-PBX_FS-INFINITY)
PBX_asterisk with PBX_FS (score:INFINITY) (id:colocation-PBX_asterisk-PBX_FS-INFINITY)
PBX_mariadb with PBX_FS (score:INFINITY) (id:colocation-PBX_mariadb-PBX_FS-INFINITY)
PBX_httpd with PBX_FS (score:INFINITY) (id:colocation-PBX_httpd-PBX_FS-INFINITY)
Ticket Constraints:
Alerts:
Alert: smtp_alert (path=/var/lib/pacemaker/alert_smtp.sh)
Recipients:
Recipient: smtp_alert-recipient (value=hidden)
Resources Defaults:
resource-stickiness=100
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: PBX
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
last-lrm-refresh: 1613632161
no-quorum-policy: ignore
stonith-enabled: true
Quorum:
Options:
Corosync配置文件
totem {
version: 2
cluster_name: PBX
secauth: on
transport: udpu
token: 5000
}
nodelist {
node {
ring0_addr: pbx-1no
nodeid: 1
}
星号.DRBD
resource asterisk_DRBD {
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
disk {
on-io-error detach;
}
net {
protocol C;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
cram-hmac-alg "sha1";
shared-secret "something";
}
on pbx-1 {
device /dev/drbd0;
disk /dev/md3;
address 10.255.255.1:7789;
meta-disk internal;
}
on pbx-2 {
device /dev/drbd0;
disk /dev/md3;
address 10.255.255.2:7789;
meta-disk internal;
}
}
node {
ring0_addr: pbx-2no
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
一开始我考虑的是路由,因为当 eno2 宕机时,10.255.255.0/30 没有路由,而是通过默认网关。但我在路由器上制定了一条规则,该规则会丢弃这些数据包,并且没有任何效果。问题可能出在哪里?
答案1
问题出在 IP 地址上。当主节点关闭时,辅助节点上的以太网链路也会关闭,并且没有 IP。所以我编写了一个脚本,如果接口上没有 IP,它会执行 ifdown/ifup