我们最近将我们的 Web 目录 (/var/www/sites) 切换到 NFS4 共享。自从在 60 秒时切换以来,我发现客户端 NFS 安装驱动器的吞吐量下降。CPU 下降(apache/PHP 正在等待),我看到网络负载下降。持续时间在 500 毫秒到 1.5 秒之间。这种情况每 60 秒发生一次。
我进行了测试,dd if=/dev/zero of=/mnt/files/samplefile bs=1M count=1024 oflag=direct
并发现在一次 60 秒的掉线过程中读/写时间有所增加。
在 NFS 挂载上,我添加了 FS-cache、noatime 和 nodiratime,没有任何变化。
/etc/export
/mnt/files {clientIP} (rw,fsid=0,sync,no_root_squash)
客户端挂载
mount -v -t nfs4 {server_ip}:/ /mnt/files -o fsc,noatime,nodiratime
根据下降的确切时间,这似乎是某种设置和/或错误配置。
任何提示将非常感谢。
服务器端 nfsstat:
Server rpc stats:
calls badcalls badfmt badauth badclnt
4066505251 262 22 240 0
Server nfs v3:
null getattr setattr lookup access
8 100% 0 0% 0 0% 0 0% 0 0%
readlink read write create mkdir
0 0% 0 0% 0 0% 0 0% 0 0%
symlink mknod remove rmdir rename
0 0% 0 0% 0 0% 0 0% 0 0%
link readdir readdirplus fsstat fsinfo
0 0% 0 0% 0 0% 0 0% 0 0%
pathconf commit
0 0% 0 0%
Server nfs v4:
null compound
72 0% 4066507670 99%
Server nfs v4 operations (centos 8):
op0-unused op1-unused op2-future access close
0 0% 0 0% 0 0% 187752303 1% 117353691 0%
commit create delegpurge delegreturn getattr
6175 0% 7467 0% 0 0% 36808013 0% 3988907750 31%
getfh link lock lockt locku
20592505 0% 0 0% 1988679 0% 0 0% 1978415 0%
lookup lookup_root nverify open openattr
32913665 0% 0 0% 0 0% 117761749 0% 0 0%
open_conf open_dgrd putfh putpubfh putrootfh
0 0% 24 0% 4050816618 32% 0 0% 328 0%
read readdir readlink remove rename
3970684 0% 1199340 0% 480 0% 181949 0% 18432 0%
renew restorefh savefh secinfo setattr
0 0% 0 0% 18432 0% 0 0% 2287964 0%
setcltid setcltidconf verify write rellockowner
0 0% 0 0% 0 0% 211708 0% 0 0%
bc_ctl bind_conn exchange_id create_ses destroy_ses
0 0% 2 0% 40 0% 46 0% 37 0%
free_stateid getdirdeleg getdevinfo getdevlist layoutcommit
1978400 0% 0 0% 0 0% 0 0% 0 0%
layoutget layoutreturn secinfononam sequence set_ssv
0 0% 0 0% 37 0% 4066651259 32% 0 0%
test_stateid want_deleg destroy_clid reclaim_comp allocate
13642707 0% 0 0% 31 0% 37 0% 0 0%
copy copy_notify deallocate ioadvise layouterror
0 0% 0 0% 0 0% 0 0% 0 0%
layoutstats offloadcancel offloadstatus readplus seek
0 0% 0 0% 0 0% 0 0% 0 0%
write_same
0 0%
客户端 nfsstat(centos 7):
calls badcalls badclnt badauth xdrcall
0 0 0 0 0
Client rpc stats:
calls retrans authrefrsh
4157327074 6 4157501443
Client nfs v4:
null read write commit open open_conf
0 0% 12539371 0% 2010537 0% 171586 0% 17387625 0% 19761 0%
open_noat open_dgrd close setattr fsinfo renew
117435773 2% 28 0% 134408077 3% 2365580 0% 425 0% 736357 0%
setclntid confirm lock lockt locku access
68577 0% 14 0% 1998403 0% 0 0% 1988136 0% 73334903 1%
getattr lookup lookup_root remove rename link
3686184054 88% 35401700 0% 149 0% 4909916 0% 378484 0% 0 0%
symlink create pathconf statfs readlink readdir
0 0% 15960 0% 276 0% 11593628 0% 490 0% 2002535 0%
server_caps delegreturn getacl setacl fs_locations rel_lkowner
931 0% 36853705 0% 0 0% 0 0% 0 0% 0 0%
secinfo exchange_id create_ses destroy_ses sequence get_lease_t
0 0% 0 0% 31 0% 37 0% 28 0% 16 0%
reclaim_comp layoutget getdevinfo layoutcommit layoutreturn getdevlist
251 0% 28 0% 0 0% 0 0% 0 0% 0 0%
(null)
34 0%
更新:在客户端上观察 htop,我注意到当发生这种情况时,顶级进程是
{NFS-IP}-mana
每次发生中断时我都会得到这个过程
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:00 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:01 [{nfsIP_address}-mana]
48800 R ? 00:00:02 [{nfsIP_address}-mana]
48800 R ? 00:00:02 [{nfsIP_address}-mana]
48800 R ? 00:00:02 [{nfsIP_address}-mana]
48800 R ? 00:00:02 [{nfsIP_address}-mana]