我在 CentOS7 上,并且有一个现有的双集群 HA 节点,运行 pacemaker (1.1.23-1.el7_9.1) 和 DRBD (kmod-drbd90-9.0.22-3.el7_9)。DRBD 驱动器的备份分区是 LUKS 加密的。我们正在向堆栈中添加第三台服务器,但在更新配置后,新服务器上的 drbd 驱动器无法连接。
地位
新框显示的当前状态是:
[root@svr3]# drbdadm status
drbd0 role:Secondary
disk:Inconsistent
svr1 connection:Connecting
svr2 connection:Connecting
从主服务器来看,状态显示如下:
[root@svr1]# drbdadm status
drbd0 role: Primary
disk:UpToDate
svr2 role:Secondary
peer-disk:UpToDate
svr3 connection:StandAlone
DRBD 配置
该资源的当前配置drbd0
是:
resource drbd0 {
protocol C;
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;
on svr1 {
address 10.10.11.1:7789;
node-id 1;
}
on svr2 {
address 10.10.11.2:7789;
node-id 2;
}
on svr3 {
address 10.10.11.3:7789;
node-id 3;
}
connection-mesh {
hosts svr1 svr2 svr3;
}
}
在添加 svr3 之前,svr1 和 svr2 上的配置如下:
resource drbd0 {
protocol C;
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;
on svr1 {
address 10.10.11.1:7789;
node-id 1;
}
on svr2 {
address 10.10.11.2:7789;
node-id 2;
}
connection-mesh {
hosts svr1 svr2;
}
}
所有机器上都使用以下脚本创建了 DRBD 磁盘:
drbdadm create-md --force drbd0
drbdadm up drbd0
仅在主服务器上,还运行以下命令来设置磁盘:
dd if=/dev/zero of=/dev/sdb1 bs=128M count=10
drbdadm primary --force drbd0
cryptsetup -q --keyfile /path/to/keyfile luksFormat /dev/drbd0
cryptsetup --key-file /path/to/keyfile luksOpen /dev/drbd0 luks-drbd
mkfs.ext4 /dev/mapper/luks-drbd
起搏器配置
使用以下脚本配置了 Pacemaker 中的 DRBD 资源。PCS 资源没有改变,因为它最初设置为允许未来的第三个节点。
pcs resource create drbd0_data ocf:linbit:drbd drbd_resource=drbd0
pcs resource master drbd0_clone drbd0_data \
master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 \
notify=true
pcs resource create drbd0_luks ocf:vendor:luks \
--group=drbd_resources
pcs resource create drbd0_fs ocf:heartbeat:Filesystem \
device=/dev/mapper/luks-drbd directory=/mnt/data fstype=ext4 \
--group=drbd_resources
pcs constraint order promote drbd0_data then start drbd_resources
pcs constraint colocation add drbd_resources \
with drbd0_clone INFINTITY with-rsc-role=Master
pcs constraint order drbd0_luks then drbd0_fs
(该drbd0_luks
资源是我们提供的自定义资源,基本上cryptsetup luksOpen|luksClose
在 LUKS 分区上适当运行)。
起搏器状态显示如下:
Online: [ svr1 svr2 svr3 ]
Active resources:
Master/Slave Set: drbd0_clone [drbd0_data]
Masters: [ svr1 ]
Slaves: [ svr2 svr3 ]
Resource Group: drbd_resources
drbd0_luks (ocf::vendor::luks): Started svr1
drbd0_fs (ocf::heartbeat::Filesystem): Started svr1
尝试连接
我尝试了以下过程的各种迭代:
[root@svr1]# drbdadm disconnect drbd0
[root@svr2]# drbdadm disconnect drbd0
[root@svr3]# drbdadm disconnect drbd0
[root@svr3]# drbdadm connect --discard-my-data drbd0
[root@svr1]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10
[root@svr2]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10
此后,的输出drbdadm status
如文章顶部所示。如果我尝试drbdadm adjust drbd0
在 svr1 或 svr2 上运行,我会得到同样的错误。
如果我尝试在资源启用drbdadm down drbd0
时运行,我会得到以下信息:drbd0_luks
[root@svr1]# drbdadm down drbd0
drbd0: State change failed: (-12) Device is held open by someone
additional info from kernel:
/dev/drbd0 opened by cryptsetup (pid 11777) at 2021-11-01 16:50:51
Command 'drbdsetup down drbd0' terminated with exit code 11
如果我禁用了drbd0_luks
资源,我可以运行drbdadm down drbd0
,但adjust
命令会失败并显示以下内容:
[root@svr1]# drbdadm adjust drbd0
0: Failure: (162) Invalid configuration request
Command 'drbdsetup attach 0 /dev/sdb1 /dev/sdb1 internal' terminated with exit code 10
所以我假设我至少需要这么多才能开始运行。此时此刻,我只是在寻找救命稻草,但我不太确定下一根正确的救命稻草是什么。