在我帮助管理的许多机器上,我遇到了最奇怪的问题。它通常在上次重启后一天或更长时间才会出现。
当我尝试挂载 nfs 共享时
sudo mount -t nfs 192.168.8.205:/export /mnt/andrew
我得到:
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
当它进入这种状态时,其他方法似乎都不起作用。我试过
sudo systemctl restart rpc-statd
sudo systemctl restart rpcbind
两个服务器似乎都在运行,我看不到明显的错误:
sudo journalctl -u rpcbind
-- Logs begin at Tue 2019-02-19 18:29:21 UTC, end at Tue 2019-03-26 04:52:48 UTC. --
Feb 25 23:21:11 box1 systemd[1]: Starting RPC bind portmap service...
Feb 25 23:21:11 box1 rpcbind[29172]: rpcbind: xdr_/run/rpcbind/rpcbind.xdr: failed
Feb 25 23:21:11 box1 rpcbind[29172]: rpcbind: xdr_/run/rpcbind/portmap.xdr: failed
Feb 25 23:21:11 box1 systemd[1]: Started RPC bind portmap service.
Mar 24 18:59:57 box1 systemd[1]: Stopping RPC bind portmap service...
Mar 24 18:59:57 box1 systemd[1]: Stopped RPC bind portmap service.
Mar 24 18:59:57 box1 systemd[1]: Starting RPC bind portmap service...
Mar 24 18:59:57 box1 systemd[1]: Started RPC bind portmap service.
asavinykh@box1:~$ sudo journalctl -u rpc-statd
-- Logs begin at Tue 2019-02-19 18:29:21 UTC, end at Tue 2019-03-26 04:56:13 UTC. --
Feb 25 23:21:11 box1 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
Feb 25 23:21:11 box1 systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Feb 25 23:21:11 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:49 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:50 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:51 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:53 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:57 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:05 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:22 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:54 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:50:58 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:53:00 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:55:02 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:57:04 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:59:07 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:01:09 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:03:11 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:05:13 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:06:32 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:07 box1 systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....
Mar 24 19:07:07 box1 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:07 box1 systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Mar 24 19:07:07 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:12 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
请注意,该rpcbind.xdr
错误也会出现在其仍在运行的机器上。
然而,在仍然正常工作的盒子和停止工作的盒子上,有一点是不同的:
ps aux | grep lockd
root 26 0.0 0.0 0 0 ? S< 04:08 0:00 [kblockd]
root 3148 0.0 0.0 0 0 ? S 04:08 0:00 [lockd]
asaviny+ 31517 0.0 0.0 16576 2096 pts/0 S+ 04:58 0:00 grep lockd
不同之处在于[lockd]
线条。它始终出现在它仍然工作的盒子上,而它始终不出现在它停止工作的盒子上。
你知道这个 [lockd] 是什么吗,如果它停止了,我该如何重新启动它,或者我该如何确保它不会停止。
我尝试过systemctl | grep lockd
,但没有任何效果。
我也尝试过journalctl -xe | grep lockd
,cat /var/log/kern.log | grep lockd
但cat /var/log/syslog | grep lockd
所有这些都没有返回任何结果。
重新启动通常可以消除这种情况。
答案1
阅读man nsfd
其中部分内容:
In the /proc filesystem there are 4 files that can be used to enabled extra tracing of
nfsd and related code. They are:
/proc/sys/sunrpc/nfs_debug
/proc/sys/sunrpc/nfsd_debug
/proc/sys/sunrpc/nlm_debug
/proc/sys/sunrpc/rpc_debug
They control tracing for the NFS client, the NFS server, the Network Lock Manager (lockd)
and the underlying RPC layer respectively. Decimal numbers can be read from or written to
these files. Each number represents a bit-pattern where bits that are set cause certain
classes of tracing to be enabled. Consult the kernel header files to find out what number
correspond to what tracing.
另请参阅https://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch11_02.htm
答案2
很抱歉回答三年前的问题,不过真的在网上很难找到关于这个主题的可靠信息,而且因为我自己费了很大劲才弄清楚低级细节,所以我想我也可以分享我的结果!
首先是一般的观察:当输出中的进程名称ps aux
括在方括号中时,例如[lockd]
这通常意味着lockd
我们正在处理内核线程。这解释了为什么中任何地方都没有(或类似名称的)可执行文件$PATH
。
确认这lockd
是一个内核模块
你可以使用该lsmod
命令列出活动的内核模块,对我来说,这确实显示了一个lockd
模块(我在本文中给出的所有示例均基于 Ubuntu 20.04 系统):
$ lsmod | grep nfs
nfsd 409600 11
auth_rpcgss 94208 1 nfsd
nfs_acl 16384 1 nfsd
lockd 102400 1 nfsd
grace 16384 2 nfsd,lockd
sunrpc 393216 30 nfsd,auth_rpcgss,lockd,nfs_acl
以下命令查找任何相关文件的位置:
$ locate lockd.ko
/usr/lib/modules/5.4.0-109-generic/kernel/fs/lockd/lockd.ko
/usr/lib/modules/5.4.0-110-generic/kernel/fs/lockd/lockd.ko
找到这些lockd.ko
文件是一个明确的确认,它lockd
是作为内核线程实现的。
谁负责启动lockd
?
因为您提到想要重新启动,所以[lockd]
我很好奇谁负责启动它,所以我进行了一些调查:
$ dpkg -L nfs-common | xargs -d '\n' grep -s lockd
Binary file /sbin/rpc.statd matches
Binary file /sbin/sm-notify matches
这些字符串引用的lockd
实际内容如下:
$ strings /sbin/rpc.statd | grep lock
modprobe lockd
$ strings /sbin/sm-notify | grep lock
/proc/fs/lockd/nlm_end_grace
这回答了我在找到此页面时想要回答的问题:
守护
lockd
进程作为内核线程实现(并且似乎没有人负责监督它并在发生故障时重新启动)。该
statd
守护进程似乎负责加载lockd.ko
内核模块(根据strings
上面的输出判断)。使用apt-get source nfs-common
我可以确认utils/statd/statd.c
包含条件system("modprobe lockd");
调用。守护
statd
进程由systemd启动并监督,可执行文件rpc.statd
由单元调用并管理rpc-statd.service
。
您提到已经尝试过,systemctl restart rpc-statd
并且根据我的研究判断,一个人负责运行modprobe lockd
,因此目前很难说手动modprobe lockd
命令是否会对您的情况有所帮助,但是对于遇到此问题的任何人,我建议尝试一下。