NFS 客户端无法从其他响应服务器挂载共享

NFS 客户端无法从其他响应服务器挂载共享

我们无法从 Fedora 8 NFS 服务器将 NFS 共享挂载到 Debian Lenny 数据库/Web-App NFS 客户端。使用选项的手动挂载命令和使用 fstab 选项的辅助挂载返回相同的行为。机器 6 天前意外崩溃,但这个问题似乎是 3 天前出现的。(是的,负责此事的工作人员今天早上刚刚向我报告)

同一台服务器对所有其他 NFS 客户端都正常工作。NFS 客户端还将其部分共享回传给其他客户端和 NFS 服务器,后者也正常工作。

依赖于这些挂载的进程从 26 日开始就挂起了。Cron 已关闭,以将平均负载保持在适当水平。

根据服务器上的“经过身份验证的挂载请求”消息,挂载在 NFS 服务器上进行了正确的身份验证,但客户端

# mount -vvv -t nfs server.example.org:/shared/foo /shared/foo/
mount: fstab path: "/etc/fstab"
mount: lock path:  "/etc/mtab~"
mount: temp path:  "/etc/mtab.tmp"
mount: spec:  "server.example.org:/shared/foo"
mount: node:  "/shared/foo/"
mount: types: "nfs"
mount: opts:  "(null)"
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "server.example.org:/shared/foo"
mount: external mount: argv[2] = "/shared/foo/"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw"
mount.nfs: trying 192.168.xxx.xxx prog 100003 vers 3 prot TCP port 2049
mount.nfs: trying 192.168.xxx.xxx prog 100005 vers 3 prot UDP port 51852

它会无限期地停留在那里,屏幕上不再有任何输出。最有可能是因为以下问题:

Mar 28 10:17:14 db kernel: [1299206.229436] mount.nfs     D e250c5d5     0 20597  20596
Mar 28 10:17:14 db kernel: [1299206.229439]        c0a3cde0 00000086 f7555b00 e250c5d5 0001ca16 c0a3cf6c ce0a9020 0000000d 
Mar 28 10:17:14 db kernel: [1299206.229444]        0013bc68 077ffe57 00000003 00000000 00000000 00000000 00000000 00000246 
Mar 28 10:17:14 db kernel: [1299206.229447]        c0a77c90 00000000 c0a77c98 ce000a7c f8e047c1 c02c93a4 f8e0479c f4518588 
Mar 28 10:17:14 db kernel: [1299206.229451] Call Trace:
Mar 28 10:17:14 db kernel: [1299206.229465]  [<f8e047c1>] rpc_wait_bit_killable+0x25/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229485]  [<c02c93a4>] __wait_on_bit+0x33/0x58
Mar 28 10:17:14 db kernel: [1299206.229490]  [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229505]  [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229519]  [<c02c9428>] out_of_line_wait_on_bit+0x5f/0x67
Mar 28 10:17:14 db kernel: [1299206.229523]  [<c0138859>] wake_bit_function+0x0/0x3c
Mar 28 10:17:14 db kernel: [1299206.229528]  [<f8e04c06>] __rpc_execute+0xbe/0x1d9 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229543]  [<f8dffa72>] rpc_run_task+0x40/0x45 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229557]  [<f8dffb00>] rpc_call_sync+0x38/0x52 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229573]  [<f8e80351>] nfs3_rpc_wrapper+0x14/0x49 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229591]  [<f8e8044f>] do_proc_fsinfo+0x54/0x75 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229607]  [<f8e80481>] nfs3_proc_fsinfo+0x11/0x36 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229621]  [<f8e70514>] nfs_probe_fsinfo+0x78/0x47f [nfs]
Mar 28 10:17:14 db kernel: [1299206.229634]  [<f8dffd1f>] rpc_shutdown_client+0x9d/0xa5 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229647]  [<f8dffb58>] rpc_ping+0x3e/0x47 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229662]  [<f8e00845>] rpc_bind_new_program+0x69/0x6f [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229677]  [<f8e71584>] nfs_create_server+0x37b/0x4fa [nfs]
Mar 28 10:17:14 db kernel: [1299206.229693]  [<c01621c1>] __alloc_pages_internal+0xb5/0x34e
Mar 28 10:17:14 db kernel: [1299206.229700]  [<c013882c>] autoremove_wake_function+0x0/0x2d
Mar 28 10:17:14 db kernel: [1299206.229703]  [<c01e7e3c>] idr_get_empty_slot+0x11c/0x1ed
Mar 28 10:17:14 db kernel: [1299206.229711]  [<f8e78fbd>] nfs_get_sb+0x528/0x810 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229724]  [<c01e8125>] idr_pre_get+0x21/0x2f
Mar 28 10:17:14 db kernel: [1299206.229729]  [<c0180159>] vfs_kern_mount+0x7b/0xed
Mar 28 10:17:14 db kernel: [1299206.229734]  [<c0180209>] do_kern_mount+0x2f/0xb8
Mar 28 10:17:14 db kernel: [1299206.229738]  [<c019264a>] do_new_mount+0x55/0x89
Mar 28 10:17:14 db kernel: [1299206.229743]  [<c0192825>] do_mount+0x1a7/0x1c6
Mar 28 10:17:14 db kernel: [1299206.229747]  [<c02ca52a>] error_code+0x72/0x78
Mar 28 10:17:14 db kernel: [1299206.229752]  [<c0190895>] copy_mount_options+0x90/0x109
Mar 28 10:17:14 db kernel: [1299206.229756]  [<c01928b1>] sys_mount+0x6d/0xa8
Mar 28 10:17:14 db kernel: [1299206.229760]  [<c0108857>] sysenter_past_esp+0x78/0xb1
Mar 28 10:17:14 db kernel: [1299206.229766]  =======================

网络运行正常,因为数据库 Web 应用程序前端的生产用户没有看到服务中断或任何性能问题。

记忆很好:

db:/var/log# free -m
             total       used       free     shared    buffers     cached
Mem:         24352      19426       4926          0        281      18283
-/+ buffers/cache:        860      23492
Swap:         7632          0       7632

在 /etc/fstab 中:

server.example.org:/shared/foo  /foo        nfs defaults    0 0

来自服务器的 /etc/exports 的相关行:/shared/foo 192.168.xxx.xxx(rw,no_root_squash)

TCPDump 看起来正常。如果有人希望我发布它,我可以发布它,但它相当大,而且输出中似乎没有任何明显令人讨厌的东西。

答案1

我没有时间进行故障排除,在阻止了开发人员发起的其他挂载尝试后,我最终重新启动了服务。

重新启动 portmap 和 Debian nfs 服务,在终止卡住的客户端挂载尝试后,一切恢复正常。NFS 服务重新启动了 rpc.statd、rpc.idmapd 和 rpc.mountd 进程。

旧的挂载尝试被终止后,不再为新的挂载请求生成堆栈跟踪。

相关内容