无法将文件写入使用 DRBD 和 Pacemake 创建的高可用性 NFS 存储。(返回权限被拒绝错误)

无法将文件写入使用 DRBD 和 Pacemake 创建的高可用性 NFS 存储。(返回权限被拒绝错误)

我正在尝试在 2 台 Fedora 38 VM 上使用 DRBD 和 Pacemake 设置高可用性 NFS 存储(第一次这样做)。

我对这项工作的主要指导是这两个文档: 文档1 文档2

我已经设法启动了起搏器集群并在我的主机上挂载了 NFS 共享文件夹,但是当我尝试在该文件夹中写入某些内容时,出现了权限被拒绝错误。

将挂载点权限更改为 666 或 777 没有帮助。

知道可能是什么问题吗?

我的 DRBD 配置如下:

#> sudo vi /etc/drbd.d/global_common.conf 
global {
 usage-count  yes;
}
common {
 disk {
    no-disk-flushes;
    no-disk-barrier;
    c-fill-target 24M;
    c-max-rate   720M;
    c-plan-ahead    15;
    c-min-rate     4M;
  }
  net {
    protocol C;
    max-buffers            36k;
    sndbuf-size            1024k;
    rcvbuf-size            2048k;
  }
}

#> sudo vi /etc/drbd.d/ha_nfs.res

resource ha_nfs {
  device "/dev/drbd1003";
  disk "/dev/nfs/share";
  meta-disk internal;
  on server1.test {
    address 192.168.1.116:7789;
  }
  on server2.test {
    address 192.168.1.167:7789;
  }
}

起搏器配置如下:

crm> configure edit
node 1: server1.test
node 2: server2.test
primitive p_drbd_attr ocf:linbit:drbd-attr
primitive p_drbd_ha_nfs ocf:linbit:drbd \
        params drbd_resource=ha_nfs \
        op monitor timeout=20s interval=21s role=Slave start-delay=12s \
        op monitor timeout=20s interval=20s role=Master start-delay=8s
primitive p_expfs_nfsshare_exports_HA exportfs \
        params clientspec="192.168.1.0/24" directory="/nfsshare/exports/HA" fsid=1003 unlock_on_stop=1 options="rw,mountpoint" \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=40s \
        op stop interval=0s timeout=120s
primitive p_fs_nfsshare_exports_HA Filesystem \
        params device="/dev/drbd1003" directory="/nfsshare/exports/HA" fstype=ext4 run_fsck=no \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=60s \
        op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver
primitive p_pb_block portblock \
        params action=block ip=192.168.1.101 portno=2049 protocol=tcp
primitive p_pb_unblock portblock \
        params action=unblock ip=192.168.1.101 portno=2049 tickle_dir="/srv/drbd-nfs/nfstest/.tickle" reset_local_on_unblock_stop=1 protocol=tcp \
        op monitor interval=10s timeout=20s start-delay=15s
primitive p_virtip IPaddr2 \
        params ip=192.168.1.101 cidr_netmask=32 \
        op monitor interval=1s timeout=40s start-delay=0s \
        op start interval=0s timeout=20s \
        op stop interval=0s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
        meta master-max=1 master-node-max=1 clone-node-max=1 clone-max=2 notify=true
clone c_drbd_attr p_drbd_attr
colocation co_ha_nfs inf: p_pb_block p_virtip ms_drbd_ha_nfs:Master p_fs_nfsshare_exports_HA p_expfs_nfsshare_exports_HA p_nfsserver p_pb_unblock
property cib-bootstrap-options: \
        have-watchdog=false \
        cluster-infrastructure=corosync \
        cluster-name=nfsCluster \
        stonith-enabled=false \
        no-quorum-policy=ignore

PCS 状态输出:

[bebe@server2 share]$ sudo pcs status
[sudo] password for bebe:
Cluster name: nfsCluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: server1.test (version 2.1.6-4.fc38-6fdc9deea29) - partition with quorum
  * Last updated: Thu Jul 13 08:50:34 2023 on server2.test
  * Last change:  Thu Jul 13 08:27:46 2023 by hacluster via crmd on server1.test
  * 2 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ server1.test server2.test ]

Full List of Resources:
  * p_virtip    (ocf::heartbeat:IPaddr2):        Started server2.test
  * p_expfs_nfsshare_exports_HA (ocf::heartbeat:exportfs):       Started server2.test
  * p_fs_nfsshare_exports_HA    (ocf::heartbeat:Filesystem):     Started server2.test
  * p_nfsserver (ocf::heartbeat:nfsserver):      Started server2.test
  * p_pb_block  (ocf::heartbeat:portblock):      Started server2.test
  * p_pb_unblock        (ocf::heartbeat:portblock):      Started server2.test
  * Clone Set: ms_drbd_ha_nfs [p_drbd_ha_nfs] (promotable):
    * Masters: [ server2.test ]
    * Slaves: [ server1.test ]
  * Clone Set: c_drbd_attr [p_drbd_attr]:
    * Started: [ server1.test server2.test ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

DRBD 状态输出:

[bebe@server2 share]$ sudo drbdadm status ha_nfs
ha_nfs role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate

答案1

听起来像是权限设置错误。要排除故障,请破坏您的设置并尝试从头开始重新创建故障转移 NFS 挂载点。

PS 总体而言,这是一种脆弱的设置,主动-被动 DRBD 复制容易出现故障转移挂载/卸载以及类似的配置错误问题。应改用主动-主动块级复制与集群感知文件系统相结合。

答案2

我的猜测是权限仍然不正确,或者您在服务器挂载文件系统之前在挂载点上设置了权限。

在文件系统挂载时,我会尝试在 DRBD Primary 的挂载点上进行递归chown和。此外,我通常会将 NFS 导出的根目录 chown 为,如果您尝试以 root 用户身份从客户端系统写入共享,这可能会有所帮助(因为这是默认的 NFS 导出选项)。您还可以尝试在 exportfs 资源上设置参数,看看这是否是您要面对的问题,但出于安全原因,您通常不想将其保持启用状态。chmodnobody:nobodyroot_squashoption="no_root_squash"

另外,我通常会在资源options=rw上设置参数exportfs,但这可能是默认的。

相关内容