我正在使用 DRBD、PCS 来运行 2 节点集群。通过配置 virtual_IP 和 DRBD 磁盘在第一个节点上工作正常。然后我在主节点上使用“pcs cluster stop”测试故障转移,磁盘和虚拟 IP 正确迁移到第二个节点。
但是,在第一个节点上,磁盘变得不可用。
drbdadm status
Error: cluster is not currently running on this node
opt_disk: No such resource
Command 'drbdsetup-84 status opt_disk' terminated with exit code 10
配置:
Cluster Name: cluster_zmbx1
Corosync Nodes:
host_1 host_2
Pacemaker Nodes:
host_1 host_2
Resources:
Master: Z_Root
Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
Resource: zroot (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=opt_disk
Operations: demote interval=0s timeout=90 (zroot-demote-interval-0s)
monitor interval=30s (zroot-monitor-interval-30s)
notify interval=0s timeout=90 (zroot-notify-interval-0s)
promote interval=0s timeout=90 (zroot-promote-interval-0s)
reload interval=0s timeout=30 (zroot-reload-interval-0s)
start interval=0s timeout=240 (zroot-start-interval-0s)
stop interval=0s timeout=100 (zroot-stop-interval-0s)
Resource: z_fs (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd0 directory=/opt/ fstype=ext4 options=noatime
Operations: monitor interval=20s timeout=40s (z_fs-monitor-interval-20s)
notify interval=0s timeout=60s (z_fs-notify-interval-0s)
start interval=0s timeout=60s (z_fs-start-interval-0s)
stop interval=0s timeout=60s (z_fs-stop-interval-0s)
Resource: MailIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=20 ip=10.64.200.21 nic=eth0
Operations: monitor interval=10s (MailIP-monitor-interval-10s)
start interval=0s timeout=20s (MailIP-start-interval-0s)
stop interval=0s timeout=20s (MailIP-stop-interval-0s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
promote Z_Root then start z_fs (kind:Mandatory)
start z_fs then start MailIP (kind:Mandatory)
Colocation Constraints:
z_fs with Z_Root (score:INFINITY) (with-rsc-role:Master)
MailIP with z_fs (score:INFINITY)
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
resource-stickiness: 200
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster_zmbx1
dc-version: 1.1.19-8.el7_6.4-c3c624ea3d
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false
Quorum:
Options:
auto_tie_breaker: 0
last_man_standing: 1
wait_for_all: 1
发生故障转移时登录源主机:
Jul 25 20:31:56 host_1 systemd: Stopping Pacemaker High Availability Cluster Manager...
Jul 25 20:31:56 host_1 pacemakerd[140991]: notice: Caught 'Terminated' signal
Jul 25 20:31:56 host_1 pacemakerd[140991]: notice: Shutting down Pacemaker
Jul 25 20:31:56 host_1 pacemakerd[140991]: notice: Stopping crmd
Jul 25 20:31:56 host_1 crmd[140997]: notice: Caught 'Terminated' signal
Jul 25 20:31:56 host_1 crmd[140997]: notice: Shutting down cluster resource manager
Jul 25 20:31:56 host_1 crmd[140997]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 25 20:31:56 host_1 pengine[140996]: notice: On loss of CCM Quorum: Ignore
Jul 25 20:31:56 host_1 pengine[140996]: notice: Scheduling Node host_1 for shutdown
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Shutdown host_1
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Promote zroot:0 ( Slave -> Master host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Stop zroot:1 ( Master host_1 ) due to node availability
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Move z_fs ( host_1 -> host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Move MailIP ( host_1 -> host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: Calculated transition 4, saving inputs in /var/lib/pacemaker/pengine/pe-input-3930.bz2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating cancel operation zroot_monitor_30000 on host_2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating stop operation MailIP_stop_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_pre_notify_demote_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_pre_notify_demote_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 IPaddr2(MailIP)[142036]: INFO: IP status = ok, IP_CIP=
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of stop operation for MailIP on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating stop operation z_fs_stop_0 locally on host_1
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: Running stop for /dev/drbd0 on /opt
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: Trying to unmount /opt
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: unmounted /opt successfully
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of stop operation for z_fs on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating demote operation zroot_demote_0 locally on host_1
Jul 25 20:31:56 host_1 kernel: block drbd0: role( Primary -> Secondary )
Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of demote operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_post_notify_demote_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_post_notify_demote_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_pre_notify_stop_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_pre_notify_stop_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating stop operation zroot_stop_0 locally on host_1
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: ack_receiver terminated
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_a_opt_disk
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Connection closed
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: conn( Disconnecting -> StandAlone )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: receiver terminated
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_r_opt_disk
Jul 25 20:31:56 host_1 kernel: block drbd0: disk( UpToDate -> Failed )
Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul 25 20:31:56 host_1 kernel: block drbd0: disk( Failed -> Diskless )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_w_opt_disk
Jul 25 20:31:56 host_1 crmd[140997]: notice: Transition aborted by deletion of nvpair[@id='status-1-master-zroot']: Transient attribute change
Jul 25 20:31:56 host_1 crmd[140997]: notice: Result of stop operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_post_notify_stop_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Transition 4 (Complete=25, Pending=0, Fired=0, Skipped=2, Incomplete=13, Source=/var/lib/pacemaker/pengine/pe-input-3930.bz2): Stopped
Jul 25 20:31:56 host_1 pengine[140996]: notice: On loss of CCM Quorum: Ignore
Jul 25 20:31:56 host_1 pengine[140996]: notice: Scheduling Node host_1 for shutdown
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Shutdown host_1
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Promote zroot:0 ( Slave -> Master host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Start z_fs ( host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: * Start MailIP ( host_2 )
Jul 25 20:31:56 host_1 pengine[140996]: notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-3931.bz2
Jul 25 20:31:56 host_1 crmd[140997]: notice: Initiating notify operation zroot_pre_notify_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]: notice: Initiating promote operation zroot_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]: notice: Initiating notify operation zroot_post_notify_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]: notice: Transition aborted by status-2-master-zroot doing modify master-zroot=10000: Transient attribute change
Jul 25 20:31:57 host_1 crmd[140997]: notice: Transition 5 (Complete=10, Pending=0, Fired=0, Skipped=1, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-3931.bz2): Stopped