当指示移动时，pcs 集群不移动资源

2024-5-31 • tag-icon

无论出于何种原因，我不再能够移动资源pcs

pacemaker-1.1.16-12.el7_4.8.x86_64
corosync-2.4.0-9.el7_4.2.x86_64
pcs-0.9.158-6.el7.centos.1.x86_64
Linux server_a.test.local 3.10.0-693.el7.x86_64

我有 4 个资源配置为资源组的一部分。这是当我尝试将ClusterIP资源从使用移动server_d到server_a使用时的操作日志pcs resource move ClusterIP servr_a.test.local

Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_delete operation for section constraints to all (origin=local/crm_resource/3)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.24.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.25.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       -- /cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP']
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=25
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by deletion of rsc_location[@id='cli-prefer-ClusterIP']: Configuration change | cib=0.25.0 source=te_update_diff:456 path=/cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP'] complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_delete operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/3, version=0.25.0)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-18.bz2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_modify operation for section constraints to all (origin=local/crm_resource/4)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.25.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.26.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=26
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       ++ /cib/configuration/constraints:  <rsc_location id="cli-prefer-ClusterIP" rsc="ClusterIP" role="Started" node="server_a.test.local" score="INFINITY"/>
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_modify operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/4, version=0.26.0)
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by rsc_location.cli-prefer-ClusterIP 'create': Configuration change | cib=0.26.0 source=te_update_diff:456 path=/cib/configuration/constraints complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: handle_response:      pe_calc calculation pe_calc-dc-1523016986-67 is obsolete
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_te_invoke: Processing graph 9 (ref=pe_calc-dc-1523016987-68) derived from /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: run_graph:    Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-19.bz2): Complete
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_log:       Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-34.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.25.0 of the CIB to disk (digest: 7511cba55b6c2f2f481a51d5585b8d36)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.tPIv7m (digest: /var/lib/pacemaker/cib/cib.OwHiKz)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-35.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.26.0 of the CIB to disk (digest: 7f962ed676a49e84410eee2ee04bae8c)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.MnRP4u (digest: /var/lib/pacemaker/cib/cib.B5sWNH)
Apr 06 12:16:31 [17287] server_d.test.local        cib:     info: cib_process_ping:     Reporting our current digest to server_d.test.local: 8182592cb4922cbf007158ab0a277190 for 0.26.0 (0x5575234afde0 0)

需要注意的是，如果我执行pcs cluster stop server_b.test.local配置组内的所有资源都会移动到另一个节点。

这是怎么回事？就像我说的，它有效，从那以后没有进行任何更改。

先感谢您！

编辑：

pcs config

[root@server_a ~]# pcs config
Cluster Name: my_app_cluster
Corosync Nodes:
 server_a.test.local server_d.test.local
Pacemaker Nodes:
 server_a.test.local server_d.test.local

Resources:
 Group: my_app
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.116.63.49
   Operations: monitor interval=10s timeout=20s (ClusterIP-monitor-interval-10s)
               start interval=0s timeout=20s (ClusterIP-start-interval-0s)
               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  Resource: Apache (class=systemd type=httpd)
   Operations: monitor interval=60 timeout=100 (Apache-monitor-interval-60)
               start interval=0s timeout=100 (Apache-start-interval-0s)
               stop interval=0s timeout=100 (Apache-stop-interval-0s)
  Resource: stunnel (class=systemd type=stunnel-my_app)
   Operations: monitor interval=60 timeout=100 (stunnel-monitor-interval-60)
               start interval=0s timeout=100 (stunnel-start-interval-0s)
               stop interval=0s timeout=100 (stunnel-stop-interval-0s)
  Resource: my_app-daemon (class=systemd type=my_app)
   Operations: monitor interval=60 timeout=100 (my_app-daemon-monitor-interval-60)
               start interval=0s timeout=100 (my_app-daemon-start-interval-0s)
               stop interval=0s timeout=100 (my_app-daemon-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: Apache
    Enabled on: server_d.test.local (score:INFINITY) (role: Started) (id:cli-prefer-Apache)
  Resource: ClusterIP
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-ClusterIP)
  Resource: my_app-daemon
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-my_app-daemon)
  Resource: stunnel
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-stunnel)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: my_app_cluster
 dc-version: 1.1.16-12.el7_4.8-94ff4df
 have-watchdog: false
 stonith-enabled: false

Quorum:
  Options:

编辑2

当我运行时，crm_simulate -sL我得到以下输出：

[root@server_a ~]# crm_simulate -sL

    Current cluster status:
    Online: [ server_a.test.local server_d.test.local ]

     Resource Group: my_app
         ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
         Apache     (systemd:httpd):        Started server_a.test.local
         stunnel    (systemd:stunnel-my_app): Started server_a.test.local
         my_app-daemon        (systemd:my_app): Started server_a.test.local

    Allocation scores:
    group_color: my_app allocation score on server_a.test.local: 0
    group_color: my_app allocation score on server_d.test.local: 0
    group_color: ClusterIP allocation score on server_a.test.local: 0
    group_color: ClusterIP allocation score on server_d.test.local: INFINITY
    group_color: Apache allocation score on server_a.test.local: 0
    group_color: Apache allocation score on server_d.test.local: INFINITY
    group_color: stunnel allocation score on server_a.test.local: INFINITY
    group_color: stunnel allocation score on server_d.test.local: 0
    group_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    group_color: my_app-daemon allocation score on server_d.test.local: 0
    native_color: ClusterIP allocation score on server_a.test.local: INFINITY
    native_color: ClusterIP allocation score on server_d.test.local: INFINITY
    native_color: Apache allocation score on server_a.test.local: INFINITY
    native_color: Apache allocation score on server_d.test.local: -INFINITY
    native_color: stunnel allocation score on server_a.test.local: INFINITY
    native_color: stunnel allocation score on server_d.test.local: -INFINITY
    native_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

    Transition Summary:

接下来，我删除了所有资源并将它们添加回来（再次像以前一样 - 我已将其记录下来），并且在运行命令时crm_simulate -sL我现在得到不同的结果：

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
     Apache     (systemd:httpd):        Started server_a.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_a.test.local
     my_app-daemon        (systemd:my_app.service): Started server_a.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: 0
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: 0
native_color: Apache allocation score on server_a.test.local: 0
native_color: Apache allocation score on server_d.test.local: -INFINITY
native_color: stunnel allocation score on server_a.test.local: 0
native_color: stunnel allocation score on server_d.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_a.test.local: 0
native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

我可以移动资源，但是当我这样做并crm_simulate -sL再次运行命令时，它会得到与以前不同的输出！

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
     Apache     (systemd:httpd):        Started server_d.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_d.test.local
     my_app-daemon        (systemd:my_app.service): Started server_d.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: INFINITY
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: INFINITY
native_color: Apache allocation score on server_a.test.local: -INFINITY
native_color: Apache allocation score on server_d.test.local: 0
native_color: stunnel allocation score on server_a.test.local: -INFINITY
native_color: stunnel allocation score on server_d.test.local: 0
native_color: my_app-daemon allocation score on server_a.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_d.test.local: 0

Transition Summary:

我有点困惑：/这是预期的行为吗？

答案1

不确定我最后的答案是否正确，但我仔细查看后man pcs发现：

move [destination node] [--master] [lifetime=] [--wait[=n]] 通过创建 -INFINITY 位置约束来禁止该节点，将资源移出当前正在运行的节点。如果指定了目标节点，则通过创建 INFINITY 位置约束以优先选择目标节点，资源将移动到该节点。如果使用 --master，则命令的范围仅限于主角色，并且您必须使用主 ID（而不是资源 ID）。如果指定了生存期，则约束将在该时间后过期，否则默认为无穷大，并且可以使用“pcs资源清除”或“pcs约束删除”手动清除约束。如果指定了 --wait，则 pcs 将等待最多“n”秒以等待资源移动，然后在成功时返回 0，在错误时返回 1。如果未指定“n”，则默认为 60 分钟。如果您希望资源最好避免在某些节点上运行，但能够故障转移到它们，请使用“pcs locationvoids”。

使用 pcs resource clear清除了限制，我可以移动资源。

答案2

对所有分组资源的偏好限制score:INFINITY可能是问题所在。INFINITY实际上等于1,000,000Pacemaker 中的，这是可以分配给分数的最高值。

使用时以下情况正确INFINITY（来自 ClusterLabs 文档）：

6.1.1. Infinity Math 
  Pacemaker implements INFINITY (or equivalently, +INFINITY) 
  internally as a score of 1,000,000. Addition and subtraction 
  with it follow these three basic rules: 

  Any value + INFINITY =  INFINITY 
  Any value - INFINITY = -INFINITY 
  INFINITY  - INFINITY = -INFINITY

1,000尝试将您的偏好分数更改为、或10,000之类的值INFINITY，然后再次运行测试。

答案1

答案2

相关内容