我正在尝试使用 drbd 和 fiesystem 在 ubuntu 14.04 上启动 postgres 服务器。
服务状态如下:
Last updated: Mon Mar 14 01:16:45 2016
Last change: Mon Mar 14 01:05:53 2016 via cibadmin on node1
Stack: corosync
Current DC: node2 (2) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
5 Resources configured
Online: [ node1 node2 ]
Master/Slave Set: ms_drbd [drbd_postgres]
Masters: [ node1 ]
Stopped: [ node2 ]
Resource Group: database
fs_postgres (ocf::heartbeat:Filesystem): Started node1
ip_postgres (ocf::heartbeat:IPaddr2): Started node1
postgresql (ocf::heartbeat:pgsql): Stopped
Failed actions:
drbd_postgres_start_0 (node=node2, call=367, rc=1, status=complete, last-rc-change=Mon Mar 14 00:55:56 2016
, queued=3798ms, exec=0ms
): unknown error
我的集群配置如下:
node $id="1" node1
node $id="2" node2
primitive drbd_postgres ocf:linbit:drbd \
params drbd_resource="db_disk" \
op monitor interval="29s" role="Master" \
op monitor interval="31s" role="Slave"
primitive fs_postgres ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/lib/postgresql/9.5/main" fstype="ext4"
primitive ip_postgres ocf:heartbeat:IPaddr2 \
params ip="192.168.1.103" cidr_netmask="24" \
op monitor interval="30s"
primitive postgresql ocf:heartbeat:pgsql \
params config="/etc/postgresql/9.5/main/postgresql.conf" \
params pgctl="/usr/lib/postgresql/9.5/bin/pg_ctl" \
params pgdata="/var/lib/postgresql/9.5/main" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
meta target-role="Started"
group database fs_postgres ip_postgres postgresql
ms ms_drbd drbd_postgres \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation fs_on_drbd inf: fs_postgres ms_drbd:Master
colocation postgresql_on_drbd inf: database ms_drbd:Master
order postgres_after_fs inf: fs_postgres:promote postgresql:start
order postgresql_after_drbd inf: ms_drbd:promote database:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore" node $id="1" node1
node $id="2" node2
primitive drbd_postgres ocf:linbit:drbd \
params drbd_resource="db_disk" \
op monitor interval="29s" role="Master" \
op monitor interval="31s" role="Slave"
primitive fs_postgres ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/var/lib/postgresql/9.5/main" fstype="ext4"
primitive ip_postgres ocf:heartbeat:IPaddr2 \
params ip="192.168.1.103" cidr_netmask="24" \
op monitor interval="30s"
primitive postgresql ocf:heartbeat:pgsql \
params config="/etc/postgresql/9.5/main/postgresql.conf" \
params pgctl="/usr/lib/postgresql/9.5/bin/pg_ctl" \
params pgdata="/var/lib/postgresql/9.5/main" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
meta target-role="Started"
group database fs_postgres ip_postgres postgresql
ms ms_drbd drbd_postgres \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation fs_on_drbd inf: fs_postgres ms_drbd:Master
colocation postgresql_on_drbd inf: database ms_drbd:Master
order postgres_after_fs inf: fs_postgres:promote postgresql:start
order postgresql_after_drbd inf: ms_drbd:promote database:start
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore"
Corosync 配置:
totem {
version: 2
cluster_name: postgresql
transport: udpu
interface {
ringnumber: 0
bindnetaddr: 192.168.1.0
broadcast: yes
mcastport: 5405
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.1.101
name: node1
nodeid: 1
}
node {
ring0_addr: 192.168.1.102
name: node2
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
DRBD 配置:
resource db_disk {
device /dev/drbd0;
meta-disk internal;
syncer {
rate 40M;
}
on node1 {
address 172.16.1.101:7789;
disk /dev/sdb1;
}
on node2 {
address 172.16.1.102:7789;
disk /dev/sdb1;
}
}
我在日志中没有看到任何错误消息,但是有以下一条:
root@node1:/var/log# egrep 'ERR|WARN' syslog
Mar 14 01:06:48 node1 Filesystem(fs_postgres)[13266]: WARNING: Couldn't find device [/dev/drbd0]. Expected /dev/??? to exist
root@node1:/var/log#
答案1
似乎需要一个顺序约束,以确保文件系统的挂载等到 drbd 资源成功提升为主资源。如果没有该顺序约束,Pacemaker 可能会在 DRBD 仍处于辅助角色时尝试挂载 FS,而这是 drbd 不允许的。
尝试这个:
order fs_after_drbd inf: ms_drbd:promote fs_postgres:start