确认这lockd是一个内核模块

确认这lockd是一个内核模块

在我帮助管理的许多机器上,我遇到了最奇怪的问题。它通常在上次重启后一天或更长时间才会出现。

当我尝试挂载 nfs 共享时

sudo mount -t nfs  192.168.8.205:/export /mnt/andrew

我得到:

mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified

当它进入这种状态时,其他方法似乎都不起作用。我试过

sudo systemctl restart rpc-statd
sudo systemctl restart rpcbind

两个服务器似乎都在运行,我看不到明显的错误:

 sudo journalctl -u rpcbind
-- Logs begin at Tue 2019-02-19 18:29:21 UTC, end at Tue 2019-03-26 04:52:48 UTC. --
Feb 25 23:21:11 box1 systemd[1]: Starting RPC bind portmap service...
Feb 25 23:21:11 box1 rpcbind[29172]: rpcbind: xdr_/run/rpcbind/rpcbind.xdr: failed
Feb 25 23:21:11 box1 rpcbind[29172]: rpcbind: xdr_/run/rpcbind/portmap.xdr: failed
Feb 25 23:21:11 box1 systemd[1]: Started RPC bind portmap service.
Mar 24 18:59:57 box1 systemd[1]: Stopping RPC bind portmap service...
Mar 24 18:59:57 box1 systemd[1]: Stopped RPC bind portmap service.
Mar 24 18:59:57 box1 systemd[1]: Starting RPC bind portmap service...
Mar 24 18:59:57 box1 systemd[1]: Started RPC bind portmap service.

asavinykh@box1:~$  sudo journalctl -u rpc-statd
-- Logs begin at Tue 2019-02-19 18:29:21 UTC, end at Tue 2019-03-26 04:56:13 UTC. --
Feb 25 23:21:11 box1 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
Feb 25 23:21:11 box1 systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Feb 25 23:21:11 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:49 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:50 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:51 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:53 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:48:57 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:05 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:22 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:49:54 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:50:58 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:53:00 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:55:02 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:57:04 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 18:59:07 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:01:09 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:03:11 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:05:13 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:06:32 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:07 box1 systemd[1]: Stopping NFS status monitor for NFSv2/3 locking....
Mar 24 19:07:07 box1 systemd[1]: Stopped NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:07 box1 systemd[1]: Starting NFS status monitor for NFSv2/3 locking....
Mar 24 19:07:07 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..
Mar 24 19:07:12 box1 systemd[1]: Started NFS status monitor for NFSv2/3 locking..

请注意,该rpcbind.xdr错误也会出现在其仍在运行的机器上。

然而,在仍然正常工作的盒子和停止工作的盒子上,有一点是不同的:

ps aux | grep lockd
root        26  0.0  0.0      0     0 ?        S<   04:08   0:00 [kblockd]
root      3148  0.0  0.0      0     0 ?        S    04:08   0:00 [lockd]
asaviny+ 31517  0.0  0.0  16576  2096 pts/0    S+   04:58   0:00 grep lockd

不同之处在于[lockd]线条。它始终出现在它仍然工作的盒子上,而它始终不出现在它停止工作的盒子上。

你知道这个 [lockd] 是什么吗,如果它停止了,我该如何重新启动它,或者我该如何确保它不会停止。

我尝试过systemctl | grep lockd,但没有任何效果。

我也尝试过journalctl -xe | grep lockdcat /var/log/kern.log | grep lockdcat /var/log/syslog | grep lockd所有这些都没有返回任何结果。

重新启动通常可以消除这种情况。

答案1

阅读man nsfd其中部分内容:

   In the /proc filesystem there are 4 files that can be used to  enabled  extra  tracing  of
   nfsd and related code.  They are:
        /proc/sys/sunrpc/nfs_debug
        /proc/sys/sunrpc/nfsd_debug
        /proc/sys/sunrpc/nlm_debug
        /proc/sys/sunrpc/rpc_debug
   They  control tracing for the NFS client, the NFS server, the Network Lock Manager (lockd)
   and the underlying RPC layer respectively.  Decimal numbers can be read from or written to
   these  files.   Each number represents a bit-pattern where bits that are set cause certain
   classes of tracing to be enabled.  Consult the kernel header files to find out what number
   correspond to what tracing.

另请参阅https://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch11_02.htm

答案2

很抱歉回答三年前的问题,不过真的在网上很难找到关于这个主题的可靠信息,而且因为我自己费了很大劲才弄清楚低级细节,所以我想我也可以分享我的结果!


首先是一般的观察:当输出中的进程名称ps aux括在方括号中时,例如[lockd] 这通常意味着lockd我们正在处理内核线程。这解释了为什么中任何地方都没有(或类似名称的)可执行文件$PATH

确认这lockd是一个内核模块

你可以使用该lsmod命令列出活动的内核模块,对我来说,这确实显示了一个lockd模块(我在本文中给出的所有示例均基于 Ubuntu 20.04 系统):

$ lsmod | grep nfs
nfsd                  409600  11
auth_rpcgss            94208  1 nfsd
nfs_acl                16384  1 nfsd
lockd                 102400  1 nfsd
grace                  16384  2 nfsd,lockd
sunrpc                393216  30 nfsd,auth_rpcgss,lockd,nfs_acl

以下命令查找任何相关文件的位置:

$ locate lockd.ko
/usr/lib/modules/5.4.0-109-generic/kernel/fs/lockd/lockd.ko
/usr/lib/modules/5.4.0-110-generic/kernel/fs/lockd/lockd.ko

找到这些lockd.ko文件是一个明确的确认,它lockd是作为内核线程实现的。

谁负责启动lockd

因为您提到想要重新启动,所以[lockd]我很好奇谁负责启动它,所以我进行了一些调查:

$ dpkg -L nfs-common | xargs -d '\n' grep -s lockd
Binary file /sbin/rpc.statd matches
Binary file /sbin/sm-notify matches

这些字符串引用的lockd实际内容如下:

$ strings /sbin/rpc.statd | grep lock
modprobe lockd

$ strings /sbin/sm-notify | grep lock
/proc/fs/lockd/nlm_end_grace

这回答了我在找到此页面时想要回答的问题:

  1. 守护lockd进程作为内核线程实现(并且似乎没有人负责监督它并在发生故障时重新启动)。

  2. statd守护进程似乎负责加载lockd.ko内核模块(根据strings上面的输出判断)。使用apt-get source nfs-common我可以确认utils/statd/statd.c包含条件system("modprobe lockd");调用。

  3. 守护statd进程由systemd启动并监督,可执行文件rpc.statd由单元调用并管理rpc-statd.service

您提到已经尝试过,systemctl restart rpc-statd并且根据我的研究判断,一个人负责运行modprobe lockd,因此目前很难说手动modprobe lockd命令是否会对您的情况有所帮助,但是对于遇到此问题的任何人,我建议尝试一下。

相关内容