我们无法从 Fedora 8 NFS 服务器将 NFS 共享挂载到 Debian Lenny 数据库/Web-App NFS 客户端。使用选项的手动挂载命令和使用 fstab 选项的辅助挂载返回相同的行为。机器 6 天前意外崩溃,但这个问题似乎是 3 天前出现的。(是的,负责此事的工作人员今天早上刚刚向我报告)
同一台服务器对所有其他 NFS 客户端都正常工作。NFS 客户端还将其部分共享回传给其他客户端和 NFS 服务器,后者也正常工作。
依赖于这些挂载的进程从 26 日开始就挂起了。Cron 已关闭,以将平均负载保持在适当水平。
根据服务器上的“经过身份验证的挂载请求”消息,挂载在 NFS 服务器上进行了正确的身份验证,但客户端
# mount -vvv -t nfs server.example.org:/shared/foo /shared/foo/
mount: fstab path: "/etc/fstab"
mount: lock path: "/etc/mtab~"
mount: temp path: "/etc/mtab.tmp"
mount: spec: "server.example.org:/shared/foo"
mount: node: "/shared/foo/"
mount: types: "nfs"
mount: opts: "(null)"
mount: external mount: argv[0] = "/sbin/mount.nfs"
mount: external mount: argv[1] = "server.example.org:/shared/foo"
mount: external mount: argv[2] = "/shared/foo/"
mount: external mount: argv[3] = "-v"
mount: external mount: argv[4] = "-o"
mount: external mount: argv[5] = "rw"
mount.nfs: trying 192.168.xxx.xxx prog 100003 vers 3 prot TCP port 2049
mount.nfs: trying 192.168.xxx.xxx prog 100005 vers 3 prot UDP port 51852
它会无限期地停留在那里,屏幕上不再有任何输出。最有可能是因为以下问题:
Mar 28 10:17:14 db kernel: [1299206.229436] mount.nfs D e250c5d5 0 20597 20596
Mar 28 10:17:14 db kernel: [1299206.229439] c0a3cde0 00000086 f7555b00 e250c5d5 0001ca16 c0a3cf6c ce0a9020 0000000d
Mar 28 10:17:14 db kernel: [1299206.229444] 0013bc68 077ffe57 00000003 00000000 00000000 00000000 00000000 00000246
Mar 28 10:17:14 db kernel: [1299206.229447] c0a77c90 00000000 c0a77c98 ce000a7c f8e047c1 c02c93a4 f8e0479c f4518588
Mar 28 10:17:14 db kernel: [1299206.229451] Call Trace:
Mar 28 10:17:14 db kernel: [1299206.229465] [<f8e047c1>] rpc_wait_bit_killable+0x25/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229485] [<c02c93a4>] __wait_on_bit+0x33/0x58
Mar 28 10:17:14 db kernel: [1299206.229490] [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229505] [<f8e0479c>] rpc_wait_bit_killable+0x0/0x2a [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229519] [<c02c9428>] out_of_line_wait_on_bit+0x5f/0x67
Mar 28 10:17:14 db kernel: [1299206.229523] [<c0138859>] wake_bit_function+0x0/0x3c
Mar 28 10:17:14 db kernel: [1299206.229528] [<f8e04c06>] __rpc_execute+0xbe/0x1d9 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229543] [<f8dffa72>] rpc_run_task+0x40/0x45 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229557] [<f8dffb00>] rpc_call_sync+0x38/0x52 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229573] [<f8e80351>] nfs3_rpc_wrapper+0x14/0x49 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229591] [<f8e8044f>] do_proc_fsinfo+0x54/0x75 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229607] [<f8e80481>] nfs3_proc_fsinfo+0x11/0x36 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229621] [<f8e70514>] nfs_probe_fsinfo+0x78/0x47f [nfs]
Mar 28 10:17:14 db kernel: [1299206.229634] [<f8dffd1f>] rpc_shutdown_client+0x9d/0xa5 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229647] [<f8dffb58>] rpc_ping+0x3e/0x47 [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229662] [<f8e00845>] rpc_bind_new_program+0x69/0x6f [sunrpc]
Mar 28 10:17:14 db kernel: [1299206.229677] [<f8e71584>] nfs_create_server+0x37b/0x4fa [nfs]
Mar 28 10:17:14 db kernel: [1299206.229693] [<c01621c1>] __alloc_pages_internal+0xb5/0x34e
Mar 28 10:17:14 db kernel: [1299206.229700] [<c013882c>] autoremove_wake_function+0x0/0x2d
Mar 28 10:17:14 db kernel: [1299206.229703] [<c01e7e3c>] idr_get_empty_slot+0x11c/0x1ed
Mar 28 10:17:14 db kernel: [1299206.229711] [<f8e78fbd>] nfs_get_sb+0x528/0x810 [nfs]
Mar 28 10:17:14 db kernel: [1299206.229724] [<c01e8125>] idr_pre_get+0x21/0x2f
Mar 28 10:17:14 db kernel: [1299206.229729] [<c0180159>] vfs_kern_mount+0x7b/0xed
Mar 28 10:17:14 db kernel: [1299206.229734] [<c0180209>] do_kern_mount+0x2f/0xb8
Mar 28 10:17:14 db kernel: [1299206.229738] [<c019264a>] do_new_mount+0x55/0x89
Mar 28 10:17:14 db kernel: [1299206.229743] [<c0192825>] do_mount+0x1a7/0x1c6
Mar 28 10:17:14 db kernel: [1299206.229747] [<c02ca52a>] error_code+0x72/0x78
Mar 28 10:17:14 db kernel: [1299206.229752] [<c0190895>] copy_mount_options+0x90/0x109
Mar 28 10:17:14 db kernel: [1299206.229756] [<c01928b1>] sys_mount+0x6d/0xa8
Mar 28 10:17:14 db kernel: [1299206.229760] [<c0108857>] sysenter_past_esp+0x78/0xb1
Mar 28 10:17:14 db kernel: [1299206.229766] =======================
网络运行正常,因为数据库 Web 应用程序前端的生产用户没有看到服务中断或任何性能问题。
记忆很好:
db:/var/log# free -m
total used free shared buffers cached
Mem: 24352 19426 4926 0 281 18283
-/+ buffers/cache: 860 23492
Swap: 7632 0 7632
在 /etc/fstab 中:
server.example.org:/shared/foo /foo nfs defaults 0 0
来自服务器的 /etc/exports 的相关行:/shared/foo 192.168.xxx.xxx(rw,no_root_squash)
TCPDump 看起来正常。如果有人希望我发布它,我可以发布它,但它相当大,而且输出中似乎没有任何明显令人讨厌的东西。
答案1
我没有时间进行故障排除,在阻止了开发人员发起的其他挂载尝试后,我最终重新启动了服务。
重新启动 portmap 和 Debian nfs 服务,在终止卡住的客户端挂载尝试后,一切恢复正常。NFS 服务重新启动了 rpc.statd、rpc.idmapd 和 rpc.mountd 进程。
旧的挂载尝试被终止后,不再为新的挂载请求生成堆栈跟踪。