集群 NFS 服务器回复 ERR 24:Auth Bogus Credentials(密封损坏)

集群 NFS 服务器回复 ERR 24:Auth Bogus Credentials(密封损坏)

我的 VirtualBox 上有 4 台服务器。其中两台服务器是带有 Pacemaker(corosync) 的 CentOS 7 集群,并且它们有一个处于主动/被动模式的 NFSv4 服务器。还有 2 个 CentOS 6 客户端,也使用此 NFS 服务器。

该问题并不总是发生,但有时当我从活动 NFS 服务器集群手动或自动故障转移时,两个客户端都会给出错误:没有权限。tcpdump来自客户的 信息显示:

[17:24:29.271467] IP client.example.net.34236755563 > server.example.net.nfs 112 getattr [|nfs]
[17:24:29.271619] IP server.example.net.nfs > client.example.net.3423675563: reply ERR 24: Auth Bogus Credentials (seal broken)

在这个问题解决之前,一切都不起作用:我尝试转移到 NFSv3,尝试了不同的集群配置,尝试了 NFSv4 10 到 90 秒的宽限期,但没有成功。

集群配置:

node 1: storage1
node 2: storage2
primitive p_drbd_nfs ocf:linbit:drbd \
    params drbd_resource=cgp \
    op monitor interval=31s role=Master \
    op monitor interval=29s role=Slave \
    op start interval=0 timeout=240s \
    op stop interval=0 timeout=120s
primitive p_fs_home Filesystem \
    params device="/dev/drbd0" directory="/mnt" fstype=xfs options="noatime,nobarrier" \
    op monitor interval=10s \
    meta is-managed=true
primitive p_ip_nfs IPaddr2 \
    params ip=192.168.56.100 cidr_netmask=24 \
    op monitor interval=30s \
    meta is-managed=true
primitive p_nfs_exports exportfs \
    params fsid=0 directory="/mnt" options="rw,async,no_wdelay,mountpoint,insecure,no_subtree_check,no_root_squash" clientspec="192.168.56.0/255.255.255.0" wait_for_leasetime_on_stop=true rmtab_backup=none \
    op monitor interval=10s \
    op stop interval=0 timeout=120s \
    meta is-managed=true
primitive p_nfsserver nfsserver \
    params grace_time=90 proc_num=16 \
    op monitor interval=30s \
    meta is-managed=true
primitive p_ping ocf:pacemaker:ping \
    params host_list=192.168.56.1 multiplier=1000 attempts=1 timeout=3 name=p_ping \
    op monitor interval=5 timeout=60
ms ms_drbd_nfs p_drbd_nfs \
    meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true is-managed=true
clone cl_p_ping p_ping \
    meta is-managed=true target-role=Started
location l_0 ms_drbd_nfs \
    rule $role=Master -inf: not_defined p_ping or p_ping lte 0
colocation c_1 inf: p_fs_home ms_drbd_nfs:Master
colocation c_2 inf: p_nfsserver p_fs_home
colocation c_3 inf: p_nfs_exports p_nfsserver
colocation c_4 inf: p_ip_nfs p_nfs_exports
order o_1 inf: ms_drbd_nfs:promote p_fs_home:start
order o_2 inf: p_fs_home p_nfsserver
order o_3 inf: p_nfsserver p_nfs_exports
order o_4 inf: p_nfs_exports p_ip_nfs
property cib-bootstrap-options: \
    dc-version=1.1.10-32.el7_0.1-368c726 \
    cluster-infrastructure=corosync \
    stonith-enabled=false \
    no-quorum-policy=ignore \
    last-lrm-refresh=1428329105
rsc_defaults rsc-options: \
    resource-stickiness=200

以下是客户端 fstab 文件中的字符串:

192.168.56.100:/        /mnt                    nfs     nfsvers=4,proto=tcp,rsize=32768,wsize=32768,hard,timeo=300,retrans=2,bg,actimeo=3,noatime,nodiratime        0 0

答案1

经过两周的不幸尝试,我解决了这个问题,我发现 rpc.mountd 中有错误,只是尝试安装最新版本的 nfs-utils:

yum update nfs-utils

相关内容