我有一个 2 节点 corosync 集群,管理一个虚拟 IP 和一个 asterisk 资源。当我故意关闭其中一个节点 (server2)(作为灾难恢复测试)时,第一个节点 (server1) 立即接管 asterisk。
但是,当 server2 启动后,server1 上的 asterisk 实例似乎不再在 server1 上运行,也不再在 server2 上运行。我更希望它始终留在正在运行的服务器上。virtual_ip 没有移动,这没问题。
我尝试在两个节点上设置粘性参数(相同的值),但这似乎没有帮助。
pcs resource meta asterisk resource-stickiness=100
和
pcs resource meta asterisk resource-stickiness=INFINITY
此外,参数“start-failure-is-fatal”设置为 false,以确保无论发生什么,server2 都无法启动 asterisk,并重试,但这也没有任何效果。设置 quorumparameters 也没有任何效果:
pcs property set stonith-enabled=false
pcs property set no-quorum-policy=ignore
pcs property set start-failure-is-fatal=false
这是我的一般配置。
totem {
version: 2
cluster_name: astcluster
secauth: off
join: 30
consensus: 300
vsftype: none
max_messages: 20
rrp_mode: none
interface {
member {
memberaddr: 192.168.83.133
}
member {
memberaddr: 192.168.83.135
}
ringnumber: 0
bindnetaddr: 192.168.83.0
mcastport: 5405
}
transport: udpu
}
nodelist {
node {
ring0_addr: astp5.internal.uzgent.be
nodeid: 1
quorum_votes: 1
}
node {
ring0_addr: astp6.internal.uzgent.be
nodeid: 2
quorum_votes: 1
}
}
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 0
expected_votes: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: no
debug: off
timestamp: on
}
有人能告诉我如何处理这个问题吗?
编辑,附加起搏器配置。
<cib crm_feature_set="3.0.10" validate-with="pacemaker-2.3" epoch="43" num_updates="0" admin_epoch="0" cib-last-written="Thu Feb 23 14:56:07 2017" update-origin="server2" update-client="crm_attribute" update-user="root" have-quorum="1" dc-uuid="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
<nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.13-10.el7-44eb2dd"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="astcluster"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1487858117"/>
<nvpair id="cib-bootstrap-options-start-failure-is-fatal" name="start-failure-is-fatal" value="false"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="server1"/>
<node id="2" uname="server2">
<instance_attributes id="nodes-2"/>
</node>
</nodes>
<resources>
<primitive class="ocf" id="virtual_ip" provider="heartbeat" type="IPaddr2">
<instance_attributes id="virtual_ip-instance_attributes">
<nvpair id="virtual_ip-instance_attributes-ip" name="ip" value="192.168.83.137"/>
<nvpair id="virtual_ip-instance_attributes-cidr_netmask" name="cidr_netmask" value="32"/>
</instance_attributes>
<operations>
<op id="virtual_ip-start-interval-0s" interval="0s" name="start" timeout="20s"/>
<op id="virtual_ip-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
<op id="virtual_ip-monitor-interval-30s" interval="30s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="asterisk" provider="heartbeat" type="asterisk">
<instance_attributes id="asterisk-instance_attributes">
<nvpair id="asterisk-instance_attributes-user" name="user" value="asterisk"/>
<nvpair id="asterisk-instance_attributes-group" name="group" value="asterisk"/>
</instance_attributes>
<meta_attributes id="asterisk-meta_attributes">
<nvpair id="asterisk-meta_attributes-is-managed" name="is-managed" value="true"/>
<nvpair id="asterisk-meta_attributes-expected-quorum-votes" name="expected-quorum-votes" value="1"/>
<nvpair id="asterisk-meta_attributes-resource-stickiness" name="resource-stickiness" value="INFINITY"/>
<nvpair id="asterisk-meta_attributes-default-resource-stickiness" name="default-resource-stickiness" value="1000"/>
</meta_attributes>
<operations>
<op id="asterisk-start-interval-0s" interval="0s" name="start" timeout="20"/>
<op id="asterisk-stop-interval-0s" interval="0s" name="stop" timeout="20"/>
<op id="asterisk-monitor-interval-60s" interval="60s" name="monitor" timeout="30"/>
</operations>
</primitive>
</resources>
<constraints/>
</configuration>
<status>
<node_state id="2" uname="server2" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="2">
<instance_attributes id="status-2">
<nvpair id="status-2-shutdown" name="shutdown" value="0"/>
<nvpair id="status-2-probe_complete" name="probe_complete" value="true"/>
<nvpair id="status-2-last-failure-asterisk" name="last-failure-asterisk" value="1487845098"/>
</instance_attributes>
</transient_attributes>
<lrm id="2">
<lrm_resources>
<lrm_resource id="virtual_ip" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="virtual_ip_last_0" operation_key="virtual_ip_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="7:59:7:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:7;7:59:7:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server2" call-id="5" rc-code="7" op-status="0" interval="0" last-run="1487845098" last-rc-change="1487845098" exec-time="68" queue-time="0" op-digest="7ea42b08d9415fb0dbbde15977130035"/>
</lrm_resource>
<lrm_resource id="asterisk" type="asterisk" class="ocf" provider="heartbeat">
<lrm_rsc_op id="asterisk_last_failure_0" operation_key="asterisk_monitor_0" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="6:79:7:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:0;6:79:7:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server2" call-id="22" rc-code="0" op-status="0" interval="0" last-run="1487858116" last-rc-change="1487858116" exec-time="47" queue-time="0" op-digest="337a6295a6acbbd18616daf0206c3394"/>
<lrm_rsc_op id="asterisk_last_0" operation_key="asterisk_stop_0" operation="stop" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="9:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:0;9:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server2" call-id="25" rc-code="0" op-status="0" interval="0" last-run="1487858128" last-rc-change="1487858128" exec-time="1036" queue-time="0" op-digest="337a6295a6acbbd18616daf0206c3394" op-secure-params=" user " op-secure-digest="cf2187fe855553314a7a6bc14ff18918"/>
<lrm_rsc_op id="asterisk_monitor_60000" operation_key="asterisk_monitor_60000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="10:80:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:0;10:80:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server2" call-id="23" rc-code="0" op-status="0" interval="60000" last-rc-change="1487858116" exec-time="47" queue-time="0" op-digest="ce41237c2113b12d51aaed8af6b8a09f" op-secure-params=" user " op-secure-digest="cf2187fe855553314a7a6bc14ff18918"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
<node_state id="1" uname="server1" in_ccm="true" crmd="online" crm-debug-origin="do_update_resource" join="member" expected="member">
<transient_attributes id="1">
<instance_attributes id="status-1">
<nvpair id="status-1-shutdown" name="shutdown" value="0"/>
<nvpair id="status-1-probe_complete" name="probe_complete" value="true"/>
</instance_attributes>
</transient_attributes>
<lrm id="1">
<lrm_resources>
<lrm_resource id="virtual_ip" type="IPaddr2" class="ocf" provider="heartbeat">
<lrm_rsc_op id="virtual_ip_last_0" operation_key="virtual_ip_start_0" operation="start" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="7:6:0:b7b79be6-bb63-4f56-b425-fc84e90ef38b" transition-magic="0:0;7:6:0:b7b79be6-bb63-4f56-b425-fc84e90ef38b" on_node="server1" call-id="10" rc-code="0" op-status="0" interval="0" last-run="1487838677" last-rc-change="1487838677" exec-time="47" queue-time="0" op-digest="7ea42b08d9415fb0dbbde15977130035"/>
<lrm_rsc_op id="virtual_ip_monitor_30000" operation_key="virtual_ip_monitor_30000" operation="monitor" crm-debug-origin="build_active_RAs" crm_feature_set="3.0.10" transition-key="7:7:0:b7b79be6-bb63-4f56-b425-fc84e90ef38b" transition-magic="0:0;7:7:0:b7b79be6-bb63-4f56-b425-fc84e90ef38b" on_node="server1" call-id="12" rc-code="0" op-status="0" interval="30000" last-rc-change="1487838679" exec-time="34" queue-time="0" op-digest="e81e10104a53c2ccab94a6935229ae08"/>
</lrm_resource>
<lrm_resource id="asterisk" type="asterisk" class="ocf" provider="heartbeat">
<lrm_rsc_op id="asterisk_last_0" operation_key="asterisk_start_0" operation="start" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="10:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:0;10:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server1" call-id="77" rc-code="0" op-status="0" interval="0" last-run="1487858129" last-rc-change="1487858129" exec-time="2517" queue-time="0" op-digest="337a6295a6acbbd18616daf0206c3394" op-secure-params=" user " op-secure-digest="cf2187fe855553314a7a6bc14ff18918"/>
<lrm_rsc_op id="asterisk_monitor_60000" operation_key="asterisk_monitor_60000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="11:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" transition-magic="0:0;11:82:0:8e6dd4d3-49ed-4e78-92b9-ec440e36f949" on_node="server1" call-id="78" rc-code="0" op-status="0" interval="60000" last-rc-change="1487858132" exec-time="46" queue-time="0" op-digest="ce41237c2113b12d51aaed8af6b8a09f" op-secure-params=" user " op-secure-digest="cf2187fe855553314a7a6bc14ff18918"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
</status>
</cib>
编辑:还尝试添加一些共置约束
[root@server1]# pcs constraint show
Location Constraints:
Ordering Constraints:
Resource Sets:
set virtual_ip asterisk
Colocation Constraints:
virtual_ip with asterisk (score:INFINITY)
编辑:找到解决方案!必须将以下参数添加到星号资源:on-fail=fence
答案1
您是否在停止资源之前将其移动到了给定节点?如果您手动移动资源,集群资源管理器会在后台为给定节点的该资源创建位置约束。我没有在您的配置 XML 中看到此约束,但可能是您在移动资源之前捕获了该配置。
我也遇到了同样的问题。在我使用“crm resource unmove”将控制权交还给集群资源管理器后,恢复节点不再导致资源移回原始节点。
有关详细信息,请参阅以下文档。