我已经在机器上安装了 nfs 驱动器。我可以使用以下命令查看它:
$ df -aTh | grep nfs
… unrelated volumes …
10.200.30.100:/export/pvc-70..1d nfs4 976M 2.0M 907M 1% /var/lib/kubelet/pods/72..1d/volumes/kubernetes.io~nfs/pvc-70..1d
[为了简洁起见,我添加了省略号。]
我像从机器本身一样重复写入文件(但从容器执行此操作会产生相同的结果)
$ while true
do echo test >> /var/lib/kubelet/pods/72..1d/volumes/kubernetes.io~nfs/pvc-70..1d/data.txt
sleep 5
done
这有效,但过了一会儿我收到错误消息:
-bash: /var/lib/kubelet/pods/72..1d/volumes/kubernetes.io~nfs/pvc-70..1d/data.txt: Invalid argument
这种情况会永远持续下去,或者说只要我还有耐心。 nfs 有什么理由应该这样做吗?为什么它突然不让我写作了?我尝试授予777
权限,但这没有任何影响。即使在大约 38 分钟后创建新文件,也会出现同样的问题。该文件无法再从机器上写入。然后,(有时)如果我读取了该文件,出于某种原因我可以再次写入一段时间。
发生错误时我收到以下日志:
[816270.396574] NFS: permission(0:231/8193), mask=0x81, res=0
[816270.403198] NFS: permission(0:231/8193), mask=0x3, res=0
[816270.406521] NFS: atomic_open(0:231/8193), data2.txt
[816270.412065] --> nfs_put_client({2})
[816270.415211] --> nfs4_setup_sequence clp ffff8fbd383c2800 session ffff8fbcf6fa1800 sr_slot 4294967295
[816270.419180] --> nfs41_setup_sequence
[816270.420853] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=3
[816270.424797] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
[816270.428181] <-- nfs41_setup_sequence slotid=0 seqid=71351
[816270.430539] <-- nfs4_setup_sequence status=0
[816270.433691] encode_sequence: sessionid=1942:1545315805:1942:0 seqid=71351 slotid=0 max_slotid=0 cache_this=1
[816270.439233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=3
[816270.442573] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
[816270.444845] nfs4_free_slot: slotid 1 highest_used_slotid 0
[816270.447379] nfs41_sequence_process: Error 0 free the slot
[816270.451224] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
有趣的是,当它工作时,它不使用atomic_open
而只是使用open
. (源代码这里)
我想我可以通过将授权设置为“无”而不是默认值来调试此问题。我怎么做?
更新:
我终于成功地可靠地重现了这个问题!我仅打开文件日志记录,并收到以下日志,该日志显示了我可以毫无问题地写入文件和操作导致错误之间的转换EINVAL
。
...
[3533332.934037] NFS: open file(/data13.txt)
[3533332.937896] NFS: flush(/data13.txt)
[3533332.941986] NFS: fsync file(/data13.txt) datasync 0
[3533332.947038] NFS: write(/data13.txt, 33@0)
[3533332.952519] NFS: flush(/data13.txt)
[3533332.962836] NFS: fsync file(/data13.txt) datasync 0
[3533332.965829] NFS: release(/data13.txt)
[3533336.975155] docker0: port 2(veth295ffd2) entered blocking state
[3533336.977344] docker0: port 2(veth295ffd2) entered disabled state
[3533336.984250] device veth295ffd2 entered promiscuous mode
[3533336.995705] IPv6: ADDRCONF(NETDEV_UP): veth295ffd2: link is not ready
[3533337.061611] nfsd: initializing export module (net: ffff8fbceb052580).
[3533337.263506] eth0: renamed from veth9c5fe34
[3533337.274301] IPv6: ADDRCONF(NETDEV_CHANGE): veth295ffd2: link becomes ready
[3533337.277259] docker0: port 2(veth295ffd2) entered blocking state
[3533337.279440] docker0: port 2(veth295ffd2) entered forwarding state
[3533337.972794] NFS: open file(/data13.txt)
[3533337.982390] NFS: flush(/data13.txt)
[3533337.984978] NFS: fsync file(/data13.txt) datasync 0
[3533337.991418] NFS: write(/data13.txt, 33@0)
[3533337.998084] NFS: flush(/data13.txt)
[3533338.017990] NFS: fsync file(/data13.txt) datasync 0
[3533338.020636] NFS: release(/data13.txt)
[3533341.385814] NFSD: laundromat service - starting
[3533341.387690] NFSD: laundromat_main - sleeping for 90 seconds
[3533342.869641] NFS: open file(/data.txt)
[3533342.894059] NFS: write(/data.txt, 1@0)
[3533342.896545] NFS: flush(/data.txt)
[3533342.906735] NFS: fsync file(/data.txt) datasync 0
[3533342.910853] NFS: release(/data.txt)
[3533342.914387] NFS: open file(/data.txt)
[3533342.917169] NFS: write(/data.txt, 29@0)
[3533342.920237] NFS: flush(/data.txt)
[3533342.927772] NFS: fsync file(/data.txt) datasync 0
[3533342.931496] NFS: release(/data.txt)
[3533343.002610] NFS: read(/data.txt, 30@30374)
[3533343.233929] docker0: port 2(veth295ffd2) entered disabled state
[3533343.239144] veth9c5fe34: renamed from eth0
[3533343.257428] nfsd_inet6addr_event: removed fe80:0000:0000:0000:f026:2cff:feb1:510b
[3533343.260065] docker0: port 2(veth295ffd2) entered disabled state
[3533343.270280] device veth295ffd2 left promiscuous mode
[3533343.272992] docker0: port 2(veth295ffd2) entered disabled state
[3533343.330925] nfsd: shutting down export module (net: ffff8fbceb052580).
[3533343.337814] nfsd: export shutdown complete (net: ffff8fbceb052580).
在上面的日志中,我可以毫无问题地写入文件,直到有关关闭所谓的export module
.