我的 wordpress 网站处于使用 NFS、nginx、php-fpm、mysql 的集群环境中,基础架构托管在 amazon ec2 上。在高负载/流量下,php5-fpm 进程进入 D 状态(不可中断的睡眠),网站崩溃。使用命令检查后
echo w > /proc/sysrq-trigger; dmesg -c | less;
发现 php-fpm 处于阻塞状态 Kernel Stack trace
[6615425.408345] SysRq:显示阻塞状态 [6615425.408362] 任务
PC 堆栈 pid 父亲 [6615425.408444] php5-fpm D 0000000000000000 0 16616 12079 0x00000000 [6615425.408453] ffff880001793938 0000000000000246 ffff880001793fd8 0000000000014580 [6615425.408457] ffff880001793fd8 0000000000014580 ffff88001cad1770 ffff88001cad1770 [6615425.408460] ffff88006c88ba00 0000000000000082 ffffffffa0044190 ffff8800017939b0 [6615425.408463] 调用跟踪:[6615425.408491] []? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [6615425.408497] [] schedule+0x29/0x70 [6615425.408506] [] rpc_wait_bit_killable+0x35/0x90 [sunrpc] [6615425.408511] [] __wait_on_bit+0x60/0x90 [6615425.408516] [] ? __queue_work+0x135/0x330 [6615425.408524] [] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc] [6615425.408528] [] out_of_line_wait_on_bit+0x77/0x90 [6615425.408532] [] ? wake_atomic_t_function+0x40/0x40 [6615425.408540] [] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc] [6615425.408553] [] nfs4_run_open_task+0x11f/0x170 [nfsv4] [6615425.408563] [] ? nfs4_get_open_state+0x76/0x1b0 [nfsv4] [6615425.408571] [] nfs4_do_open+0x1d8/0x930 [nfsv4] [6615425.408581] [] ?generic_lookup_cred+0x15/0x20 [sunrpc] [6615425.408591] [] ?rpcauth_lookupcred+0x77/0xc0 [sunrpc] [6615425.408603] [] ? nfs_do_access+0x69/0x250 [nfs] [6615425.408610] [] nfs4_atomic_open+0xd4/0xe0 [nfsv4] [6615425.408619] [] nfs4_file_open+0xb9/0x1b0 [nfsv4]
我还发现 php5-fpm 经常在 dmesg 中出现段错误 4。
操作系统是 ubuntu 12.04LTS,运行 nfs-kernel-server,客户端操作系统是 Ubuntu 13.04,运行 nfsv4。我尝试升级 NFS 服务器的实例大小,也增加了 NFS 服务器线程,但没有效果。
到目前为止还没有找到合适的解决方案。