我正在设置一个由 2 个节点(一个主节点和一个从节点)组成的 PostgreSQL 9.2.8 集群。我使用流式复制和 repmgr 2.0。我的平台是 RHEL 6.3。
主服务器和从服务器可以手动启动,但我无法通过 Pacemaker 集群启动它们。日志显示主服务器已成功启动,然后监视器返回 8(作为主服务器运行),然后它几乎立即被 Pacemaker 降级。在我的第二个节点上,从服务器根本没有启动:
Apr 28 12:41:07 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) POSTGRESQL : action = start
Apr 28 12:41:07 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) START: status = 7
start() NOT RUNNING
Apr 28 12:41:07 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) calling pg_start
Apr 28 12:41:07 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) starting /usr/pgsql-9.2/bin/pg_ctl start -w -D /opt/pgdata -l /var//ha/postgresql/postgres_ha.log -o '-c config_file=/opt/pgdata/postgresql.conf' -o '-p 5432'
Apr 28 12:41:08 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) pg_start(): asked for start , waiting 10s
Apr 28 12:41:18 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) pg_start(): asked for start, waited long enough
checing state
Apr 28 12:41:18 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:start:stdout) in pg_state_check: 100
started as MASTER
Apr 28 12:41:18 [1892] clustera attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: master-pg92 (50)
Apr 28 12:41:18 [1892] clustera attrd: notice: attrd_perform_update: Sent update 194: master-pg92=50
Apr 28 12:41:18 [1894] clustera crmd: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-clustera-master-pg92, name=master-pg92, value=50, magic=NA, cib=0.220.185) : Transient attribute: update
Apr 28 12:41:18 [1894] clustera crmd: info: process_lrm_event: LRM operation POSTGRESQL:0_start_0 (call=105, rc=0, cib-update=192, confirmed=true) ok
Apr 28 12:41:18 [1894] clustera crmd: info: te_rsc_command: Initiating action 41: notify POSTGRESQL:0_post_notify_start_0 on clustera (local)
Apr 28 12:41:18 clustera lrmd: [1891]: info: rsc:POSTGRESQL:0:106: notify
Apr 28 12:41:18 [1894] clustera crmd: info: te_rsc_command: Initiating action 43: notify POSTGRESQL:1_post_notify_start_0 on clusterb
Apr 28 12:41:18 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:notify:stdout) UNAME === clustera
Apr 28 12:41:18 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:notify:stdout) SUCCESS
Apr 28 12:41:18 clustera lrmd: [1891]: info: RA output: (POSTGRESQL:0:notify:stdout) POSTGRESQL : action = notify
Apr 28 12:41:18 [1894] clustera crmd: info: process_lrm_event: LRM operation POSTGRESQL:0_notify_0 (call=106, rc=0, cib-update=0, confirmed=true) ok
Apr 28 12:41:18 [1894] clustera crmd: notice: run_graph: ==== Transition 64 (Complete=21, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-67.bz2): Stopped
Apr 28 12:41:18 [1894] clustera crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 28 12:41:18 [1893] clustera pengine: notice: unpack_config: On loss of CCM Quorum: Ignore
Apr 28 12:41:18 [1893] clustera pengine: warning: unpack_rsc_op: Processing failed op POSTGRESQL:0_last_failure_0 on clustera: master (8)
Apr 28 12:41:18 [1893] clustera pengine: notice: common_apply_stickiness: MS_POSTGRESQL can fail 999987 more times on clustera before being forced off
Apr 28 12:41:18 [1893] clustera pengine: notice: common_apply_stickiness: MS_POSTGRESQL can fail 999987 more times on clustera before being forced off
Apr 28 12:41:18 [1894] clustera crmd: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Apr 28 12:41:18 [1894] clustera crmd: info: do_te_invoke: Processing graph 65 (ref=pe_calc-dc-1398703278-318) derived from /var/lib/pengine/pe-input-68.bz2
Apr 28 12:41:18 [1894] clustera crmd: info: te_rsc_command: Initiating action 9: monitor POSTGRESQL:0_monitor_60000 on clustera (local)
Apr 28 12:41:18 clustera lrmd: [1891]: info: rsc:POSTGRESQL:0:107: monitor
Apr 28 12:41:18 [1893] clustera pengine: notice: process_pe_message: Transition 65: PEngine Input stored in: /var/lib/pengine/pe-input-68.bz2
Apr 28 12:41:18 [1892] clustera attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: master-pg92 (100)
Apr 28 12:41:18 [1892] clustera attrd: notice: attrd_perform_update: Sent update 196: master-pg92=100
Apr 28 12:41:18 [1894] clustera crmd: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-clustera-master-pg92, name=master-pg92, value=100, magic=NA, cib=0.220.188) : Transient attribute: update
Apr 28 12:41:18 [1894] clustera crmd: info: process_lrm_event: LRM operation POSTGRESQL:0_monitor_60000 (call=107, rc=8, cib-update=194, confirmed=false) master
Apr 28 12:41:18 [1894] clustera crmd: warning: status_from_rc: Action 9 (POSTGRESQL:0_monitor_60000) on clustera failed (target: 0 vs. rc: 8): Error
Apr 28 12:41:18 [1894] clustera crmd: warning: update_failcount: Updating failcount for POSTGRESQL:0 on clustera after failed monitor: rc=8 (update=value++, time=1398703278)
Apr 28 12:41:18 [1894] clustera crmd: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=POSTGRESQL:0_last_failure_0, magic=0:8;9:65:0:70e313b8-b64a-4340-8d96-e64054ac9439, cib=0.220.190) : Event failed
Apr 28 12:41:18 [1894] clustera crmd: notice: run_graph: ==== Transition 65 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-68.bz2): Complete
Apr 28 12:41:18 [1894] clustera crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Apr 28 12:41:18 [1892] clustera attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-POSTGRESQL:0 (14)
我的起搏器配置如下:
node clustera attributes standby="off"
node clusterb attributes standby="off"
primitive POSTGRESQL ocf:xxx:postgresql \
params repmgr_conf="/var/lib/pgsql/repmgr/repmgr.conf" pgctl="/usr/pgsql-9.2/bin/pg_ctl" pgdata="/opt/pgdata" \
op start interval="0" timeout="90s" \
op stop interval="0" timeout="60s" \
op promote interval="0" timeout="120s" \
op monitor interval="53s" role="Master" \
op monitor interval="60s" role="Slave"
ms MS_POSTGRESQL POSTGRESQL \
meta clone-max="2" target-role="Started" resource-stickiness="100" notify="true"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-resource-stickiness="10" \
start-failure-is-fatal="false" \
last-lrm-refresh="1398700283"
我的postgresql资源代理如下(灵感来自https://github.com/xmm/repmgr):
OCF_ROOT=/usr/lib/ocf
OCF_RESOURCE_INSTANCE="pg92"
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
: ${OCF_RESKEY_CRM_meta_clone_node_max=1}
: ${OCF_RESKEY_CRM_meta_master_max=1}
: ${OCF_RESKEY_CRM_meta_master_node_max=1}
: ${OCF_RESKEY_PG_ROOT=/usr/pgsql-9.2/bin}
: ${OCF_RESKEY_repmgr=${OCF_RESKEY_PG_ROOT}/repmgr}
: ${OCF_RESKEY_repmgr_conf=/var/lib/pgsql/repmgr/repmgr.conf}
: ${OCF_RESKEY_repmgr_clone_opt="-d postgres -U repmgr -R postgres"}
: ${OCF_RESKEY_pgctl=${OCF_RESKEY_PG_ROOT}/pg_ctl}
: ${OCF_RESKEY_psql=${OCF_RESKEY_PG_ROOT}/psql}
: ${OCF_RESKEY_pgdata=/opt/pgdata}
: ${OCF_RESKEY_pgconfig=${OCF_RESKEY_pgdata}/postgresql.conf}
: ${OCF_RESKEY_pgdba=postgres}
: ${OCF_RESKEY_pgport=5432}
: ${OCF_RESKEY_start_opt="-p $OCF_RESKEY_pgport"}
: ${OCF_RESKEY_pgdb=pgbench}
: ${OCF_RESKEY_logfile=/var/ha/postgresql/postgres_ha.log}
: ${OCF_RESKEY_stop_escalate=30}
: ${OCF_RESKEY_master_score=100}
: ${OCF_RESKEY_slave_score=50}
: ${OCF_RESKEY_STATUS_ISREPLICATION_SQL="SELECT pg_is_in_recovery();"}
CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot"
PIDFILE=${OCF_RESKEY_pgdata}/postmaster.pid
SOCKETDIR=/var/run/postgresql
meta_data() {
cat <<EOF
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="pgsql" version="1.0">
<version>1.0</version>
<longdesc lang="en">
Master/Slave OCF Resource Agent for PostgreSQL with Hot, Warm or Streaming Backup
</longdesc>
<shortdesc lang="en">Manages a PostgreSQL Master/Slave instance</shortdesc>
<parameters>
<parameter name="repmgr" unique="0" required="0">
<longdesc lang="en">
Path to repmgr command.
</longdesc>
<shortdesc lang="en">repmgr</shortdesc>
<content type="string" default="/usr/bin/repmgr" />
</parameter>
<parameter name="repmgr_conf" unique="0" required="0">
<longdesc lang="en">
Path to repmgr config file.
</longdesc>
<shortdesc lang="en">repmgr_conf</shortdesc>
<content type="string" default="/var/lib/pgsql/data" />
</parameter>
<parameter name="repmgr_clone_opt" unique="0" required="0">
<longdesc lang="en">
standby clone params for repmgr command.
</longdesc>
<shortdesc lang="en">repmgr_clone_opt</shortdesc>
<content type="string" default="-d postgres -U repmgr -R postgres" />
</parameter>
<parameter name="pgctl" unique="0" required="0">
<longdesc lang="en">
Path to pg_ctl command.
</longdesc>
<shortdesc lang="en">pgctl</shortdesc>
<content type="string" default="/usr/bin/pg_ctl" />
</parameter>
<parameter name="start_opt" unique="0" required="0">
<longdesc lang="en">
Start options (-o start_opt in pgi_ctl). "-i -p 5432" for example.
</longdesc>
<shortdesc lang="en">start_opt</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="ctl_opt" unique="0" required="0">
<longdesc lang="en">
Additional pg_ctl options. Default is ""
</longdesc>
<shortdesc lang="en">ctl_opt</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="psql" unique="0" required="0">
<longdesc lang="en">
Path to psql command.
</longdesc>
<shortdesc lang="en">psql</shortdesc>
<content type="string" default="/usr/bin/psql" />
</parameter>
<parameter name="pgdata" unique="0" required="0">
<longdesc lang="en">
Path to PostgreSQL data directory.
</longdesc>
<shortdesc lang="en">pgdata</shortdesc>
<content type="string" default="/var/lib/pgsql/data" />
</parameter>
<parameter name="pgdba" unique="0" required="0">
<longdesc lang="en">
User that owns PostgreSQL.
</longdesc>
<shortdesc lang="en">pgdba</shortdesc>
<content type="string" default="postgres" />
</parameter>
<parameter name="pghost" unique="0" required="0">
<longdesc lang="en">
Hostname/IP Addreess where PosrgeSQL is listening
</longdesc>
<shortdesc lang="en">pghost</shortdesc>
<content type="string" default="" />
</parameter>
<parameter name="pgport" unique="0" required="0">
<longdesc lang="en">
Port where PosrgeSQL is listening
</longdesc>
<shortdesc lang="en">pgport</shortdesc>
<content type="string" default="5432" />
</parameter>
<parameter name="pgdb" unique="0" required="0">
<longdesc lang="en">
Database that will be used for monitoring.
</longdesc>
<shortdesc lang="en">pgdb</shortdesc>
<content type="string" default="template1" />
</parameter>
<parameter name="logfile" unique="0" required="0">
<longdesc lang="en">
Path to PostgreSQL server log output file.
</longdesc>
<shortdesc lang="en">logfile</shortdesc>
<content type="string" default="/dev/null" />
</parameter>
<parameter name="stop_escalate" unique="0" required="0">
<longdesc lang="en">
Number of retries (using -m fast) before resorting to -m immediate
</longdesc>
<shortdesc lang="en">stop escalation</shortdesc>
<content type="string" default="30" />
</parameter>
<parameter name="master_score" unique="0" required="0">
<longdesc lang="en">
Score for adding to node with master instance
</longdesc>
<shortdesc lang="en">master_score</shortdesc>
<content type="string" default="100" />
</parameter>
<parameter name="slave_score" unique="0" required="0">
<longdesc lang="en">
Score for adding to node with slave instance
</longdesc>
<shortdesc lang="en">slave_score</shortdesc>
<content type="string" default="50" />
</parameter>
</parameters>
<actions>
<action name="start" timeout="90" />
<action name="promote" timeout="90" />
<action name="demote" timeout="90" />
<action name="stop" timeout="60" />
<action name="notify" timeout="20" />
<action name="monitor" depth="0" timeout="20" interval="20" role="Slave"/>
<action name="monitor" depth="0" timeout="20" interval="10" role="Master"/>
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="20" />
</actions>
</resource-agent>
EOF
exit $OCF_SUCCESS
}
log_params()
{
ocf_log info "ACTION=$__OCF_ACTION"
for param in `env | grep OCF | sort`
do
ocf_log info "$param"
done
}
meta_expect()
{
local what=$1 whatvar=OCF_RESKEY_CRM_meta_${1//-/_} op=$2 expect=$3
local val=${!whatvar}
if [[ -n $val ]]; then
# [, not [[, or it won't work ;)
[ $val $op $expect ] && return
fi
ocf_log err "meta parameter misconfigured, expected $what $op $expect, but found ${val:-unset}."
exit $OCF_ERR_CONFIGURED
}
check_config() {
if [ ! -r "$1" ] ; then
ocf_log err "Setup problem: Couldn't find config file $1"
exit $OCF_ERR_INSTALLED
fi
}
get_node_status () {
ocf_log info "Getting node status: "
output_health=`${OCF_RESKEY_psql} -U $OCF_RESKEY_pgdba -d $OCF_RESKEY_pgdb -Atc "SELECT 1=1;" |grep "t"`
rc=$?
if [ $rc -eq 0 ]; then
ocf_log info "[Healthcheck] Node is running fine"
else
ocf_log info "[Healthcheck] Node is down"
return 6
fi
output_slave=`${OCF_RESKEY_psql} -U $OCF_RESKEY_pgdba -d $OCF_RESKEY_pgdb -Atc "SELECT pg_is_in_recovery();" |grep "t"`
rc=$?
if [ $rc -eq 0 ]; then
ocf_log info "[Node status] SLAVE"
return 0
else
ocf_log info "[Node status] MASTER"
return 100
fi
}
run_as_pg() {
ocf_log info "Run as $OCF_RESKEY_pgdba: $@"
output=`su - $OCF_RESKEY_pgdba -c "$*" 2>&1`
rc=$?
output=`echo $output`
if [ $rc -eq 0 ]; then
if [ ! -z "$output" ]; then
ocf_log info "$output"
fi
return $OCF_SUCCESS
else
if [ ! -z "$output" ]; then
ocf_log err "$output"
else
ocf_log err "command failed: $*"
fi
return $OCF_ERR_GENERIC
fi
}
pg_check_pid() {
if [ -f $PIDFILE ]
then
PID=`head -n 1 $PIDFILE`
kill -0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1 | grep $PID >/dev/null 2>&1
return $?
fi
false
}
run_repmgr() {
# pgctl should be in PATH (set it in /etc/login.defs or ~postgres/.profile)
ocf_log info "Run as $OCF_RESKEY_pgdba: ${OCF_RESKEY_repmgr} $@"
su --login $OCF_RESKEY_pgdba -c "${OCF_RESKEY_repmgr} $*" 2>&1 >> /dev/r.out
rc=$?
# 100 = master state
if [ $rc -ne 0 -a $rc -ne 100 ]; then
ocf_log err "command failed: ${OCF_RESKEY_repmgr} $*"
fi
return $rc
}
pg_state_check() {
local status
if ! pg_check_pid
then
return $OCF_NOT_RUNNING
else
ocf_log info "PostgreSQL proccess exist..."
fi
get_node_status
status=$?
echo "in pg_state_check: $status"
case $status in
0) return $OCF_SUCCESS ;;
100) return $OCF_RUNNING_MASTER ;;
6|7) rc=$OCF_NOT_RUNNING ;; # ERR_DB_CON, ERR_DB_QUERY
1|9) rc=$OCF_ERR_INSTALLED ;; # ERR_BAD_CONFIG, ERR_BAD_PASSWORD
*) # TODO: Is it need to return OCF_ERR_CONFIGURED for full shutdown of the resource?
ocf_log err "${OCF_RESOURCE_INSTANCE}: UNEXPECTED repmgr error ($status)!!!"
rc=$status ;;
esac
return $rc
}
pg_start() {
mkdir -p $SOCKETDIR && \
chown $OCF_RESKEY_pgdba. $SOCKETDIR && \
chmod 2775 $SOCKETDIR
echo starting ${OCF_RESKEY_pgctl} start -w -D ${OCF_RESKEY_pgdata} $OCF_RESKEY_ctl_opt -l ${OCF_RESKEY_logfile} -o "'-c config_file=${OCF_RESKEY_pgconfig}'" -o "'$OCF_RESKEY_start_opt'"
run_as_pg ${OCF_RESKEY_pgctl} start -w -D ${OCF_RESKEY_pgdata} $OCF_RESKEY_ctl_opt -l ${OCF_RESKEY_logfile} -o "'-c config_file=${OCF_RESKEY_pgconfig}'" -o "'$OCF_RESKEY_start_opt'"
echo "pg_start(): asked for start , waiting 10s"
sleep 10
echo "pg_start(): asked for start, waited long enough"
}
pg_stop() {
local status
run_as_pg ${OCF_RESKEY_pgctl} stop -m fast -D ${OCF_RESKEY_pgdata} -l ${OCF_RESKEY_logfile} -o "'-c config_file=${OCF_RESKEY_pgconfig}'"
# stop waiting
count=0
while [ $count -lt $OCF_RESKEY_stop_escalate ]
do
pg_state_check
status=$?
if [ "$status" -eq $OCF_NOT_RUNNING ]; then
#PostgreSQL stopped
break;
fi
count=`expr $count + 1`
sleep 1
done
if pg_check_pid
then
#PostgreSQL is still up. Use another shutdown mode.
ocf_log info "PostgreSQL failed to stop after ${OCF_RESKEY_stop_escalate}s using -m fast. Trying -m immediate..."
run_as_pg ${OCF_RESKEY_pgctl} stop -w -m immediate -D ${OCF_RESKEY_pgdata} -l ${OCF_RESKEY_logfile} -o "'-c config_file=${OCF_RESKEY_pgconfig}'"
while :
do
pg_check_pid || break
sleep 1
ocf_log debug "PostgreSQL still hasn't stopped yet. Waiting..."
done
fi
rm -f $PIDFILE
ocf_log info "PostgreSQL is stopped"
return $OCF_SUCCESS
}
pg_promote() {
local status
run_repmgr -f ${OCF_RESKEY_repmgr_conf} --verbose --force standby promote
status=$?
case $status in
0) return $OCF_SUCCESS ;;
6|7) rc=$OCF_NOT_RUNNING ;; # ERR_DB_CON, ERR_DB_QUERY
1|9) rc=$OCF_ERR_INSTALLED ;; # ERR_BAD_CONFIG, ERR_BAD_PASSWORD
4) rc=$OCF_ERR_GENERIC ;; # ERR_NO_RESTART
*) rc=$OCF_ERR_GENERIC ;;
esac
ocf_log err "${OCF_RESOURCE_INSTANCE}: Promoting failed ($status)"
return $rc
}
pg_follow_master() {
run_repmgr -f ${OCF_RESKEY_repmgr_conf} standby follow
}
pg_clone_master() {
run_repmgr -D ${OCF_RESKEY_pgdata} ${OCF_RESKEY_repmgr_clone_opt} --force standby clone $1 || return $OCF_ERR_INSTALLED
}
pg_demote() {
new_master=$1
run_repmgr --verbose standby clone $new_master
}
rename_data() {
if [ -d ${OCF_RESKEY_pgdata} ]; then
new_name="`dirname ${OCF_RESKEY_pgdata}`/`basename ${OCF_RESKEY_pgdata}`-`date +%Y%m%d-%H:%M:%S`"
if [ -d $new_name ] ; then
new_name="`dirname ${OCF_RESKEY_pgdata}`/`basename ${OCF_RESKEY_pgdata}`-`date +%Y%m%d-%H:%M:%S-%N`"
fi
if mv ${OCF_RESKEY_pgdata} $new_name ; then
ocf_log info "${OCF_RESOURCE_INSTANCE} Data dir ${OCF_RESKEY_pgdata} saved as $new_name"
else
ocf_log err "${OCF_RESOURCE_INSTANCE} Cannot rename data dir ${OCF_RESKEY_pgdata} to $new_name"
return $OCF_ERR_INSTALLED
fi
fi
}
pgsql_start() {
local status
ocf_log info "${OCF_RESOURCE_INSTANCE}: Starting"
pg_state_check
status=$?
echo "START: status = $status"
case "$status" in
$OCF_RUNNING_MASTER)
echo "start() RUNNING MASTER"
ocf_log warn "${OCF_RESOURCE_INSTANCE} already started as Primary."
;;
$OCF_SUCCESS)
echo "start() SUCCESS"
ocf_log warn "${OCF_RESOURCE_INSTANCE} already started as Standby."
;;
$OCF_NOT_RUNNING)
echo "start() NOT RUNNING"
log_params
# $OCF_RESKEY_CRM_meta_notify_master_uname can be ' '
if [ "$OCF_RESKEY_CRM_meta_notify_master_uname" != ' ' -a "$OCF_RESKEY_CRM_meta_notify_master_uname" != '`$HOSTNAME`' ] ; then
echo "XXX"
ocf_log info "${OCF_RESOURCE_INSTANCE} Master instance exist on host ${OCF_RESKEY_CRM_meta_notify_master_uname}"
if [ ! -f ${OCF_RESKEY_pgdata}/recovery.conf ] ; then
echo "no reconvery.conf"
ocf_log warn "${OCF_RESOURCE_INSTANCE} recovery.conf file not found. I think this is old Master. Start cloning the current Master..."
rename_data &&
pg_clone_master $OCF_RESKEY_CRM_meta_notify_master_uname ||
return $?
fi
fi
echo "calling pg_start"
pg_start
echo "checing state"
pg_state_check
status=$?
if [ "$status" = $OCF_RUNNING_MASTER ] ; then
echo "started as MASTER"
ocf_log warn "${OCF_RESOURCE_INSTANCE} started as Master"
elif [ "$status" = $OCF_SUCCESS ] ; then
echo "started as STANDBY"
ocf_log info "${OCF_RESOURCE_INSTANCE} started as Standby"
else
echo "unexpected status $status"
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status) of node at start action"
return $status
fi
;;
*)
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status) of node at start action"
#$CRM_MASTER -D
return $status
;;
esac
$CRM_MASTER -v ${OCF_RESKEY_slave_score}
return $OCF_SUCCESS
}
pgsql_promote() {
local status
ocf_log info "${OCF_RESOURCE_INSTANCE}: Promoting"
pg_state_check
status=$?
case "$status" in
$OCF_RUNNING_MASTER)
ocf_log warn "${OCF_RESOURCE_INSTANCE} already started as Primary."
;;
$OCF_SUCCESS)
pg_promote
status=$?
if [ $status = $OCF_RUNNING_MASTER ] ; then
ocf_log warn "${OCF_RESOURCE_INSTANCE} started as Primary."
else
#$CRM_MASTER -D
return $status
fi
;;
$OCF_NOT_RUNNING)
#$CRM_MASTER -D
return $status
;;
*)
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status) of node at promote action."
#$CRM_MASTER -D
return $status
esac
$CRM_MASTER -v ${OCF_RESKEY_master_score}
return $OCF_SUCCESS
}
pgsql_demote() {
# We cannot switch to standby if another master not started yet
local status
ocf_log info "${OCF_RESOURCE_INSTANCE}: Demoting"
$CRM_MASTER -D
pg_state_check
status=$?
case "$status" in
$OCF_RUNNING_MASTER)
log_params
pg_stop
pg_state_check
status=$?
if [ "$status" = $OCF_NOT_RUNNING ] ; then
return $OCF_SUCCESS
elif [ "$status" = $OCF_RUNNING_MASTER -o "$status" = $OCF_SUCCESS ] ; then
ocf_log warn "${OCF_RESOURCE_INSTANCE} Cannot stop resource, still runned"
return $OCF_ERR_GENERIC
fi
return $OCF_SUCCESS
;;
$OCF_SUCCESS)
ocf_log warn "${OCF_RESOURCE_INSTANCE} already Standby."
return $OCF_SUCCESS
;;
$OCF_NOT_RUNNING)
ocf_log err "Trying to demote a resource that was not started"
return $OCF_NOT_RUNNING
;;
*)
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status) of node at demote action. Score will removed"
;;
esac
return $status
}
pgsql_stop() {
local status
ocf_log info "${OCF_RESOURCE_INSTANCE}: Stopping"
$CRM_MASTER -D
pg_state_check
status=$?
case "$status" in
$OCF_RUNNING_MASTER | $OCF_SUCCESS)
pg_stop
pg_state_check
status=$?
if [ "$status" = $OCF_NOT_RUNNING ] ; then
return $OCF_SUCCESS
elif [ "$status" = $OCF_RUNNING_MASTER -o "$status" = $OCF_SUCCESS ] ; then
ocf_log warn "${OCF_RESOURCE_INSTANCE} Cannot stop resource, still runned"
return $OCF_ERR_GENERIC
fi
;;
$OCF_NOT_RUNNING)
ocf_log err "${OCF_RESOURCE_INSTANCE} Trying to stop a resource that was not started"
return $OCF_SUCCESS
;;
*)
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status) of node at stop action. Score will removed"
;;
esac
return $status
}
pgsql_monitor() {
local status
pg_state_check
status=$?
echo "pg_state_check returned $status"
case "$status" in
$OCF_RUNNING_MASTER)
echo ">>>>>> MASTER"
echo $CRM_MASTER
echo $CRM_MASTER -v ${OCF_RESKEY_master_score}
ocf_log info "${OCF_RESOURCE_INSTANCE} PostgreSQL in Master mode"
$CRM_MASTER -v ${OCF_RESKEY_master_score}
;;
$OCF_SUCCESS)
echo ">>>>>> SLAVE"
ocf_log info "${OCF_RESOURCE_INSTANCE} PostgreSQL in Standby mode"
$CRM_MASTER -v ${OCF_RESKEY_slave_score}
;;
$OCF_NOT_RUNNING)
echo ">>>>>> NOT RUNNING"
ocf_log info "${OCF_RESOURCE_INSTANCE} PostgreSQL is not runned"
$CRM_MASTER -D
;;
*)
ocf_log err "${OCF_RESOURCE_INSTANCE} Unexpected status ($status)"
$CRM_MASTER -D
;;
esac
return $status
}
pgsql_validate_all() {
meta_expect master-max -le 1
meta_expect clone-node-max = 1
meta_expect master-node-max = 1
echo "UNAME === ${OCF_RESKEY_CRM_meta_notify_start_uname}"
if [ $__OCF_ACTION != "monitor" -a "${OCF_RESKEY_CRM_meta_notify_start_uname- NOT SET }" = " NOT SET " ]; then
ocf_log err "you should enable notify when using this RA"
log_params
echo "ERR CONFIGURED in VALIDATE ALL"
return $OCF_ERR_CONFIGURED
fi
check_binary fuser
check_binary $OCF_RESKEY_pgctl
check_binary $OCF_RESKEY_psql
check_binary $OCF_RESKEY_repmgr
check_config $OCF_RESKEY_pgconfig
check_config $OCF_RESKEY_repmgr_conf
if ! su --login $OCF_RESKEY_pgdba -c "type -p `basename $OCF_RESKEY_pgctl`" > /dev/null ; then
ocf_log err "`basename $OCF_RESKEY_pgctl` should be in PATH for user $OCF_RESKEY_pgdba"
return $OCF_ERR_INSTALLED
fi
echo "SUCCESS"
return $OCF_SUCCESS
}
pgsql_notify() {
local n_type=$OCF_RESKEY_CRM_meta_notify_type
local n_op=$OCF_RESKEY_CRM_meta_notify_operation
# post/promote: slave follow new master
ocf_log info "${OCF_RESOURCE_INSTANCE}: NOTIFY $n_type/$n_op"
#log_params
return $OCF_SUCCESS
}
pgsql_usage() {
cat <<END
usage: $0 {start|stop|promote|demote|monitor|validate-all|meta-data}
Expects to have a fully populated OCF RA-compliant environment set.
END
exit $1
}
#######################################################################
### Main ###
if [ $# -ne 1 ]; then
usage
exit $OCF_ERR_ARGS
fi
case $__OCF_ACTION in
meta-data)
meta_data
exit $OCF_SUCCESS;;
usage|help)
pgsql_usage $OCF_SUCCESS;;
esac
pgsql_validate_all || exit
echo "POSTGRESQL : action = $__OCF_ACTION"
case $__OCF_ACTION in
start) pgsql_start;;
promote) pgsql_promote;;
demote) pgsql_demote;;
stop) pgsql_stop;;
notify) pgsql_notify;;
monitor) pgsql_monitor;;
validate-all) ;;
*) pgsql_usage $OCF_ERR_UNIMPLEMENTED;;
esac
exit
有什么线索吗?
答案1
在资源代理中,不应该出现以下内容:
OCF_ROOT=/usr/lib/ocf OCF_RESOURCE_INSTANCE="pg92"
这些旨在通过手动启动并强制 OCF_RESOURCE_INSTANCE 变量来手动调试脚本。
这解决了这个问题。