使用 pcs 进行故障转移导致 DRBD 磁盘在源服务器上不可用“没有此类资源”

使用 pcs 进行故障转移导致 DRBD 磁盘在源服务器上不可用“没有此类资源”

我正在使用 DRBD、PCS 来运行 2 节点集群。通过配置 virtual_IP 和 DRBD 磁盘在第一个节点上工作正常。然后我在主节点上使用“pcs cluster stop”测试故障转移,磁盘和虚拟 IP 正确迁移到第二个节点。

但是,在第一个节点上,磁盘变得不可用。

drbdadm status
Error: cluster is not currently running on this node
opt_disk: No such resource
Command 'drbdsetup-84 status opt_disk' terminated with exit code 10

配置:

Cluster Name: cluster_zmbx1
Corosync Nodes:
 host_1 host_2
Pacemaker Nodes:
 host_1 host_2

Resources:
 Master: Z_Root
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
  Resource: zroot (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=opt_disk
   Operations: demote interval=0s timeout=90 (zroot-demote-interval-0s)
               monitor interval=30s (zroot-monitor-interval-30s)
               notify interval=0s timeout=90 (zroot-notify-interval-0s)
               promote interval=0s timeout=90 (zroot-promote-interval-0s)
               reload interval=0s timeout=30 (zroot-reload-interval-0s)
               start interval=0s timeout=240 (zroot-start-interval-0s)
               stop interval=0s timeout=100 (zroot-stop-interval-0s)
 Resource: z_fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd0 directory=/opt/ fstype=ext4 options=noatime
  Operations: monitor interval=20s timeout=40s (z_fs-monitor-interval-20s)
              notify interval=0s timeout=60s (z_fs-notify-interval-0s)
              start interval=0s timeout=60s (z_fs-start-interval-0s)
              stop interval=0s timeout=60s (z_fs-stop-interval-0s)
 Resource: MailIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=20 ip=10.64.200.21 nic=eth0
  Operations: monitor interval=10s (MailIP-monitor-interval-10s)
              start interval=0s timeout=20s (MailIP-start-interval-0s)
              stop interval=0s timeout=20s (MailIP-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  promote Z_Root then start z_fs (kind:Mandatory)
  start z_fs then start MailIP (kind:Mandatory)
Colocation Constraints:
  z_fs with Z_Root (score:INFINITY) (with-rsc-role:Master)
  MailIP with z_fs (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: 200
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_zmbx1
 dc-version: 1.1.19-8.el7_6.4-c3c624ea3d
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

Quorum:
  Options:
    auto_tie_breaker: 0
    last_man_standing: 1
    wait_for_all: 1

发生故障转移时登录源主机:

Jul 25 20:31:56 host_1 systemd: Stopping Pacemaker High Availability Cluster Manager...
Jul 25 20:31:56 host_1 pacemakerd[140991]:  notice: Caught 'Terminated' signal
Jul 25 20:31:56 host_1 pacemakerd[140991]:  notice: Shutting down Pacemaker
Jul 25 20:31:56 host_1 pacemakerd[140991]:  notice: Stopping crmd
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Caught 'Terminated' signal
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Shutting down cluster resource manager
Jul 25 20:31:56 host_1 crmd[140997]:  notice: State transition S_IDLE -> S_POLICY_ENGINE
Jul 25 20:31:56 host_1 pengine[140996]:  notice: On loss of CCM Quorum: Ignore
Jul 25 20:31:56 host_1 pengine[140996]:  notice: Scheduling Node host_1 for shutdown
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Shutdown host_1
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Promote    zroot:0     ( Slave -> Master host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Stop       zroot:1     (          Master host_1 )   due to node availability
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Move       z_fs        ( host_1 -> host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Move       MailIP      ( host_1 -> host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice: Calculated transition 4, saving inputs in /var/lib/pacemaker/pengine/pe-input-3930.bz2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating cancel operation zroot_monitor_30000 on host_2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating stop operation MailIP_stop_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_pre_notify_demote_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_pre_notify_demote_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 IPaddr2(MailIP)[142036]: INFO: IP status = ok, IP_CIP=
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of stop operation for MailIP on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating stop operation z_fs_stop_0 locally on host_1
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: Running stop for /dev/drbd0 on /opt
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: Trying to unmount /opt
Jul 25 20:31:56 host_1 Filesystem(z_fs)[142110]: INFO: unmounted /opt successfully
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of stop operation for z_fs on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating demote operation zroot_demote_0 locally on host_1
Jul 25 20:31:56 host_1 kernel: block drbd0: role( Primary -> Secondary )
Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of demote operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_post_notify_demote_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_post_notify_demote_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_pre_notify_stop_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_pre_notify_stop_0 locally on host_1
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of notify operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating stop operation zroot_stop_0 locally on host_1
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: ack_receiver terminated
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_a_opt_disk
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Connection closed
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: conn( Disconnecting -> StandAlone )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: receiver terminated
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_r_opt_disk
Jul 25 20:31:56 host_1 kernel: block drbd0: disk( UpToDate -> Failed )
Jul 25 20:31:56 host_1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jul 25 20:31:56 host_1 kernel: block drbd0: disk( Failed -> Diskless )
Jul 25 20:31:56 host_1 kernel: drbd opt_disk: Terminating drbd_w_opt_disk
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Transition aborted by deletion of nvpair[@id='status-1-master-zroot']: Transient attribute change
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Result of stop operation for zroot on host_1: 0 (ok)
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_post_notify_stop_0 on host_2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Transition 4 (Complete=25, Pending=0, Fired=0, Skipped=2, Incomplete=13, Source=/var/lib/pacemaker/pengine/pe-input-3930.bz2): Stopped
Jul 25 20:31:56 host_1 pengine[140996]:  notice: On loss of CCM Quorum: Ignore
Jul 25 20:31:56 host_1 pengine[140996]:  notice: Scheduling Node host_1 for shutdown
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Shutdown host_1
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Promote    zroot:0     (    Slave -> Master host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Start      z_fs        (                    host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice:  * Start      MailIP      (                    host_2 )
Jul 25 20:31:56 host_1 pengine[140996]:  notice: Calculated transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-3931.bz2
Jul 25 20:31:56 host_1 crmd[140997]:  notice: Initiating notify operation zroot_pre_notify_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]:  notice: Initiating promote operation zroot_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]:  notice: Initiating notify operation zroot_post_notify_promote_0 on host_2
Jul 25 20:31:57 host_1 crmd[140997]:  notice: Transition aborted by status-2-master-zroot doing modify master-zroot=10000: Transient attribute change
Jul 25 20:31:57 host_1 crmd[140997]:  notice: Transition 5 (Complete=10, Pending=0, Fired=0, Skipped=1, Incomplete=4, Source=/var/lib/pacemaker/pengine/pe-input-3931.bz2): Stopped

相关内容