使用两个无共享存储节点构建的高可用性 NFS 服务？

Question

假设您将使用 pacemaker 来故障转移 DRBD 活动角色和文件系统挂载，您需要添加的只是一些集群资源：

另一个（微小的）DRBD 携带/var/lib/nfs
浮动 IP 地址
共置约束，以确保两个 DRBD 活动角色、两个挂载、IP 地址和 NFS 服务位于同一节点上
序列化约束，以确保 NFS 服务仅在所有其他资源启动后才启动

现在，您让所有客户端将其 NFS 挂载到浮动 IP 地址。如果主服务器发生故障，NFS 操作将停滞，直到故障转移完成（这应该相当快），然后继续。我使用这样的设置来为广播行业部署中心提供服务，每对服务器都有 2x12x8TB 磁盘，它就可以正常工作。我定期对每对服务器进行滚动升级，客户端从不出错。

一些随机提示：

确保不要在服务器上启用异步写入，否则您将在故障转移时丢失数据
确保所有超时挂载选项都符合故障转移时间
在施加负载之前进行测试 - 包括硬断电故障转移测试
一定要使用多环 corosync 设置，你肯定不希望出现脑裂
一定要为 DRBD 使用专用的绑定接口，同样，你也不希望出现脑裂或随机断开连接

来自“crm config show”的一些片段：

primitive drbd0_res ocf:linbit:drbd \
        params drbd_resource=r0 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive drbd1_res ocf:linbit:drbd \
        params drbd_resource=r1 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive ip_nfs_res IPaddr2 \
        params ip=192.168.10.103 cidr_netmask=24 nic=eno1 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_res service:nfs-kernel-server \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_fs_res Filesystem \
        params device="/dev/drbd0" directory="/srv/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive varlibnfs_fs_res Filesystem \
        params device="/dev/drbd1" directory="/var/lib/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
ms drbd0_ms drbd0_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
ms drbd1_ms drbd1_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
colocation drbd_colo inf: drbd0_ms:Master drbd1_ms:Master
colocation drbd_nfs_fs_colo inf: nfs_fs_res drbd0_ms:Master
colocation drbd_varlibnfs_fs_colo inf: varlibnfs_fs_res drbd1_ms:Master
colocation nfs_ip_fs_colo inf: ip_nfs_res nfs_fs_res
colocation nfs_ip_service_colo inf: ip_nfs_res nfs_res
order drbd_nfs_fs_order Mandatory: drbd0_ms:promote nfs_fs_res:start
order drbd_nfs_fs_serialize Serialize: drbd0_ms:promote nfs_fs_res:start
order drbd_varlibnfs_fs_order Mandatory: drbd1_ms:promote varlibnfs_fs_res:start
order drbd_varlibnfs_fs_serialize Serialize: drbd1_ms:promote varlibnfs_fs_res:start
order nfs_fs_ip_order Mandatory: nfs_fs_res:start ip_nfs_res:start
order nfs_fs_ip_order Serialize: nfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Mandatory: varlibnfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Serialize: varlibnfs_fs_res:start ip_nfs_res:start
order ip_service_order Mandatory: ip_nfs_res:start nfs_res:start
order ip_service_order Serialize: ip_nfs_res:start nfs_res:start

Answer 1

假设您将使用 pacemaker 来故障转移 DRBD 活动角色和文件系统挂载，您需要添加的只是一些集群资源：

另一个（微小的）DRBD 携带/var/lib/nfs
浮动 IP 地址
共置约束，以确保两个 DRBD 活动角色、两个挂载、IP 地址和 NFS 服务位于同一节点上
序列化约束，以确保 NFS 服务仅在所有其他资源启动后才启动

现在，您让所有客户端将其 NFS 挂载到浮动 IP 地址。如果主服务器发生故障，NFS 操作将停滞，直到故障转移完成（这应该相当快），然后继续。我使用这样的设置来为广播行业部署中心提供服务，每对服务器都有 2x12x8TB 磁盘，它就可以正常工作。我定期对每对服务器进行滚动升级，客户端从不出错。

一些随机提示：

确保不要在服务器上启用异步写入，否则您将在故障转移时丢失数据
确保所有超时挂载选项都符合故障转移时间
在施加负载之前进行测试 - 包括硬断电故障转移测试
一定要使用多环 corosync 设置，你肯定不希望出现脑裂
一定要为 DRBD 使用专用的绑定接口，同样，你也不希望出现脑裂或随机断开连接

来自“crm config show”的一些片段：

primitive drbd0_res ocf:linbit:drbd \
        params drbd_resource=r0 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive drbd1_res ocf:linbit:drbd \
        params drbd_resource=r1 \
        op monitor interval=103s timeout=120s role=Master \
        op monitor interval=105s timeout=120s role=Slave \
        op start timeout=240s interval=0 \
        op stop timeout=180s interval=0 \
        op notify timeout=120 interval=0
primitive ip_nfs_res IPaddr2 \
        params ip=192.168.10.103 cidr_netmask=24 nic=eno1 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_res service:nfs-kernel-server \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive nfs_fs_res Filesystem \
        params device="/dev/drbd0" directory="/srv/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
primitive varlibnfs_fs_res Filesystem \
        params device="/dev/drbd1" directory="/var/lib/nfs" fstype=ext4 \
        meta target-role=Started \
        op start timeout=180s interval=0 \
        op stop timeout=180s interval=0
ms drbd0_ms drbd0_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
ms drbd1_ms drbd1_res \
        meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Started
colocation drbd_colo inf: drbd0_ms:Master drbd1_ms:Master
colocation drbd_nfs_fs_colo inf: nfs_fs_res drbd0_ms:Master
colocation drbd_varlibnfs_fs_colo inf: varlibnfs_fs_res drbd1_ms:Master
colocation nfs_ip_fs_colo inf: ip_nfs_res nfs_fs_res
colocation nfs_ip_service_colo inf: ip_nfs_res nfs_res
order drbd_nfs_fs_order Mandatory: drbd0_ms:promote nfs_fs_res:start
order drbd_nfs_fs_serialize Serialize: drbd0_ms:promote nfs_fs_res:start
order drbd_varlibnfs_fs_order Mandatory: drbd1_ms:promote varlibnfs_fs_res:start
order drbd_varlibnfs_fs_serialize Serialize: drbd1_ms:promote varlibnfs_fs_res:start
order nfs_fs_ip_order Mandatory: nfs_fs_res:start ip_nfs_res:start
order nfs_fs_ip_order Serialize: nfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Mandatory: varlibnfs_fs_res:start ip_nfs_res:start
order varlibnfs_fs_ip_order Serialize: varlibnfs_fs_res:start ip_nfs_res:start
order ip_service_order Mandatory: ip_nfs_res:start nfs_res:start
order ip_service_order Serialize: ip_nfs_res:start nfs_res:start

使用两个无共享存储节点构建的高可用性 NFS 服务？

答案1

相关内容