NFS 客户端间歇性性能问题(积压队列等待?)

NFS 客户端间歇性性能问题(积压队列等待?)

我的一个 NFS 客户端 (Ubuntu 16.04 LTS) 出现了一个奇怪的问题。过去几天我一直在尝试调试这个问题,但到目前为止还没有成功。在我挂载分区后,几天内一切都运行正常,客户端和服务器之间的传输速度为 1 Gbps。几天后,速度下降到不到 10 mbps,甚至一个简单的目录列表也要花几秒钟,I/O 等待时间达到 100%

我注意到的是积压等待,特别是对于写入操作,非常高:

root@srv:~# mountstats /mnt/data
Stats for 192.168.0.15:/mnt/data mounted on /mnt/data:
  NFS mount options: rw,vers=4.0,rsize=16384,wsize=16384,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none
  NFS server capabilities: caps=0xffdf,wtmult=512,dtsize=16384,bsize=0,namlen=255
  NFSv4 capability flags: bm0=0xfdffbfff,bm1=0xf9be3e,bm2=0x0,acl=0x3,pnfs=notconfigured
  NFS security flavor: 1  pseudoflavor: 0

NFS byte counts:
  applications read 8168679407142 bytes via read(2)
  applications wrote 4833000353435 bytes via write(2)
  applications read 0 bytes via O_DIRECT read(2)
  applications wrote 0 bytes via O_DIRECT write(2)
  client read 4218977852758 bytes via NFS READ
  client wrote 4832098253207 bytes via NFS WRITE

RPC statistics:
  561421762 RPC requests sent, 561421608 RPC replies received (1 XIDs not found)
  average backlog queue length: 0

READ:
        263822474 ops (46%)     0 retrans (0%)  0 major timeouts
        avg bytes sent per op: 184      avg bytes received per op: 16051
        backlog wait: 8.772689  RTT: 27.972131  total execute time: 36.752241 (milliseconds)
WRITE:
        295296111 ops (52%)     0 retrans (0%)  0 major timeouts
        avg bytes sent per op: 16567    avg bytes received per op: 132
        backlog wait: 62468603019.791718        RTT: 78.030143  total execute time: 62468603097.830574 (milliseconds)

没有错误,没有警告,我尝试使用“echo 1 > /proc/sys/vm/block_dump”进行调试(过去这种方法对我来说非常有效),但这次没有看到与 NFS 相关的内容。有没有什么想法可以进一步调试并查看是什么导致了极高的积压等待?

答案1

以防万一有人遇到同样的问题,我能找到的唯一解决方法是强制使用 NFS3 而不是 NFS4。问题现在已经消失了。

相关内容