rpc.mountd 在启动时立即出现段错误

2024-6-9 • tag-icon

在我的服务器和笔记本电脑上完成例行程序apt upgrade并重新启动所有内容后，我发现

$ mount nfs-server:/mnt /mountpoint

就永远悬着了。似乎没有任何合乎逻辑的解释。

我首先转向 Wireshark 寻求见解：

140 60.439219227  192.168.0.2  192.168.0.3  NFS  170  V4 Reply (Call In 138) EXCHANGE_ID
141 60.439302740  192.168.0.3  192.168.0.2  NFS  258  V4 Call (Reply In 142) CREATE_SESSION
142 60.439984105  192.168.0.2  192.168.0.3  NFS  194  V4 Reply (Call In 141) CREATE_SESSION
143 60.440070415  192.168.0.3  192.168.0.2  NFS  198  V4 Call (Reply In 147) PUTROOTFH | GETATTR
147 65.511499058  192.168.0.2  192.168.0.3  NFS  158  V4 Reply (Call In 143) PUTROOTFH Status: NFS4ERR_DELAY

呵呵。所以它有点工作，但有些事情并不完全令人高兴。有什么阻碍？

接下来我转向dmesg，它立即让我过了“兜圈子，不确定地寻找线索”的阶段，一下子就过去了：

[  283.998430] rpc.mountd[2238]: segfault at 0 ip 00007f816550f3d6 sp 00007ffd60245820 error 4 in libc-2.28.so[7f81654b7000+148000]
[  283.998523] Code: 1f 44 00 00 85 f6 0f 8e 88 00 00 00 83 fe 01 0f 84 8f 00 00 00 41 54 83 ee 01 49 89 fc 41 b8 01 00 00 00 55 b9 0a 00 00 00 53 <8b> 02 48 89 d3 89 c5 83 e0 df 89 02 48 63 d6 48 89 fe 48 89 df 83

这几乎是在启动后立即发生的。我想知道这里发生了什么事......

我自己该如何开始呢？在单元文件中进行的一些挖掘表明它需要参数--manage-gids，并眯着眼睛--help提到-F要在前台运行它。好的。

# rpc.mountd --manage-gids -F
rpc.mountd: Version 1.3.3 starting
Segmentation fault

好吧，重现并不需要太长时间！那它有什么理由这么做呢？

stat("/", {st_mode=S_IFDIR|0755, st_size=37, ...}) = 0
openat(AT_FDCWD, "/etc/mtab", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
futex(0x7faf15dd5168, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault

...哦不，我的系统还没有自行冲洗，是吗？：S

# ls -l /etc/mtab
lrwxrwxrwx 1 root root 19 Jul 12 15:17 /etc/mtab -> ../proc/self/mounts

...好吧，哇。

唔。我敢打赌，那个流氓openat()实际上是来自图书馆的电话。我想知道是否ltrace添加了任何其他有趣的背景？

__xstat(1, "/", 0x7ffe45d9f620)   = 0
free(0)                           = <void>
strdup("/")                       = 0x559b3adaf640
strlen("/")                       = 1
setmntent("/etc/mtab", "r")       = 0
getmntent(0 <no return ...>
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

嗯嗯，什么是——

DESCRIPTION
       These routines are used  to  access  the  filesystem  description  file
       /etc/fstab and the mounted filesystem description file /etc/mtab.

       The setmntent() function opens the filesystem description file filename
       and returns a file pointer which can be used by getmntent().  The argu-
       ment  type  is the type of access required and can take the same values
       as the mode argument of fopen(3).

       The getmntent() function reads the next line of the filesystem descrip-
       tion  file  from stream and returns a pointer to a structure containing
       the broken out fields from a line in the file.  The pointer points to a
       static area of memory which is overwritten by subsequent calls to getm-
       ntent().

       ...

RETURN VALUE
       The getmntent() and getmntent_r() functions return a pointer to the mn-
       tent structure or NULL on failure.

等一下：setmntent()返回 0，又名 NULL 指针。然后 NULL 指针被用在getmntent().听起来好像坏掉了……？

我想知道这里到底发生了什么。如果无论rpc.mountd包含什么包，都有调试符号，那不是很好吗……哦，太棒了，nfs-kernel-server-dbgsym是一件事。凉爽的。

程序收到信号 SIGSEGV，分段错误。
__GI___fgets_unlocked 中的 0x00007ffff7db73d6 (buf=buf@entry=0x5555555a8060 "", n=n@entry=4096,fp=fp@条目=0x0）在 iofgets_u.c:50
50 iofgets_u.c：没有这样的文件或目录。
(gdb) BT
#0 0x00007ffff7db73d6 在 __GI___fgets_unlocked (buf=buf@entry=0x5555555a8060 "", n=n@entry=4096, fp=fp@entry=0x0) 在 iofgets_u.c:50
#1 __GI___getmntent_r 中的 0x00007ffff7e2ef16 (流=流@条目=0x0, mp=mp@entry=0x7ffff7efb140 , buffer=0x5555555a8060 "", bufsiz=bufsiz@entry=4096) 在 mntent_r.c:130
#2 0x00007ffff7e2ed03 在 getmntent (流=流@条目=0x0）在mntent.c：52
#3 0x000055555555cfa8 in next_mnt (v=0x555555571ac8, p=0x5555555a4ff8 "/") 在cache.c:383
#4 nfsd_fh (f=6) 在cache.c:736
#5 0x000055555555d2cd 在cache_process_req (readfds=readfds@entry=0x7fffffffe960) 在cache.c:1424
#6 0x000055555555d718 在 my_svc_run () 处 svc_run.c:117
#7 0x0000555555558ee3 in main (argc=, argv=) at mountd.c:894

唔。

第367章
[第 368 章]
第369章
[第 370 章]
371 {
第372章
第373章
第374章
375 if (*v == NULL) {
第376章
第377章
第378章
第379章
第380章
第381章
[第 382 章]
第383章
第384章
385 结束（f）；
第386章
第387章
第388章
[第 389 章]
第390章

嗯嗯。

那么...setmntent()返回 NULL，它被分配给f，然后getmntent(f)是...段错误...？

我们来看一下。

$ cat setmntent.c
#include <stdio.h>
#include <mntent.h>

int main() {
        FILE *ret = setmntent("/etc/mtab", "r");
        printf("%p\n", (void *)ret);
        return 0;
}

$ gcc -o setmntent setmntent.c

$ ./setmntent 
(nil)

耶。所以在我的系统上，因为/etc/mtab是符号链接，所以setmntent()会爆炸。伟大的。

让我猜猜...

$ cat getmntent.c
#include <mntent.h>

int main() {
        getmntent(NULL);
}

$ gcc -o getmntent getmntent.c

$ ./getmntent 
Segmentation fault

那么这完全符合逻辑。 *总台*

我只是想让 NFS 工作。做什么？

答案1

可怕的解决方法：

`rm /etc/mtab; cat /proc/mounts > /etc/mtab`

买者自负：

这可能会随机破坏那些预计`/etc/mtab`永远不会消失的东西，哪怕是一纳秒

如果没有足够的备用胶带（时间、资源）可用，可能不会使用它来修复生产

我不会将此（自我回答）标记为已接受，因为我认为这是一种黑客行为。相反，我寻求其他评论/答案讨论

为什么我的系统/etc/mtab指向/proc/self/mounts，因为这个 NFS 问题似乎 100% 可重现，并且理论上（？）会到处爆发
为什么会发生这种情况
我应该做什么规范地解决这种情况（特别是通过/etc/mtab合理配置）

此类信息将不胜感激！

我只是想让 NFS 工作。做什么？

答案1

可怕的解决方法：

rm /etc/mtab; cat /proc/mounts > /etc/mtab

买者自负：

这可能会随机破坏那些预计/etc/mtab永远不会消失的东西，哪怕是一纳秒

如果没有足够的备用胶带（时间、资源）可用，可能不会使用它来修复生产

相关内容

`rm /etc/mtab; cat /proc/mounts > /etc/mtab`

这可能会随机破坏那些预计`/etc/mtab`永远不会消失的东西，哪怕是一纳秒