Glusterfs - 客户端随机断开连接,直到卷重新启动

Glusterfs - 客户端随机断开连接,直到卷重新启动

Ubuntu 18.04 Glusterfs-7.0

我已经为我的文件共享创建了一个卷并启动了它:

sudo gluster volume create NAME replica 3 transport tcp host0:/path0 host1:/path1 host2:/path2
sudo gluster volume start NAME

然后,将 fstab 记录添加到我的客户端:

host0:NAME /home/mountpoint glusterfs defaults,_netdev 0 0

并将其安装到我的客户端上:

sudo mount /home/mountpoint

然后,随机地,在 1-7 天后,它会断开我的客户端(可能断开 3 个中的 2 个),大多发生在夜间,但有时也会发生在白天。如果我进入该目录,它会显示:

Transport endpoint is not connected

为了使 mount 重新上线,我必须执行以下操作:

sudo umount /home/mountpoint && sudo mount /home/mountpoint

大多数情况下,它都能正常工作。但有时它会失败,日志文件中没有具体原因,但显示“砖块处于离线状态”。Glusterd 在所有 3 台服务器上运行,并且没有崩溃:

[2019-12-14 03:49:54.210690] W [socket.c:774:__socket_rwv] 0-launcher-client-2: readv on <IP>:<PORT> failed (No data available)
[2019-12-14 03:49:54.210718] I [MSGID: 114018] [client.c:2347:client_rpc_notify] 0-launcher-client-2: disconnected from launcher-client-2. Client process will keep trying to connect to glusterd until brick's port is available
[2019-12-14 03:49:54.210735] W [MSGID: 108001] [afr-common.c:5653:afr_notify] 0-launcher-replicate-0: Client-quorum is not met
[2019-12-14 03:49:57.271596] E [MSGID: 114058] [client-handshake.c:1456:client_query_portmap_cbk] 0-launcher-client-2: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2019-12-14 03:50:23.647924] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649274: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:23.648092] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649275: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:46.192371] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649321: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:46.192445] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649322: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:46.626681] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649323: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:46.626769] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649324: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:48.254712] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649328: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:50:48.254862] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649329: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.002344] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649357: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.002426] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649358: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.478503] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649362: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.478566] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649363: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.870624] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649364: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:02.870713] W [fuse-bridge.c:1276:fuse_attr_cbk] 0-glusterfs-fuse: 1649365: STAT() /www => -1 (Transport endpoint is not connected)
[2019-12-14 03:51:13.450634] W [fuse-bridge.c:2837:fuse_readv_cbk] 0-glusterfs-fuse: 1649389: READ => -1 gfid=270fafc1-615a-4686-a0f8-50e17965ba10 fd=0x7f64c002c468 (Transport endpoint is not connected)
[2019-12-14 03:51:13.450702] W [fuse-bridge.c:2837:fuse_readv_cbk] 0-glusterfs-fuse: 1649390: READ => -1 gfid=270fafc1-615a-4686-a0f8-50e17965ba10 fd=0x7f64c002c468 (Transport endpoint is not connected)
[2019-12-14 03:51:13.450717] W [fuse-bridge.c:2837:fuse_readv_cbk] 0-glusterfs-fuse: 1649391: READ => -1 gfid=270fafc1-615a-4686-a0f8-50e17965ba10 fd=0x7f64c002c468 (Transport endpoint is not connected)
[2019-12-14 03:51:13.450807] W [fuse-bridge.c:2837:fuse_readv_cbk] 0-glusterfs-fuse: 1649392: READ => -1 gfid=270fafc1-615a-4686-a0f8-50e17965ba10 fd=0x7f64c002c468 (Transport endpoint is not connected)
[2019-12-14 03:51:13.450906] W [fuse-bridge.c:2837:fuse_readv_cbk] 0-glusterfs-fuse: 1649393: READ => -1 gfid=270fafc1-615a-4686-a0f8-50e17965ba10 fd=0x7f64c002c468 (Transport endpoint is not connected)

我必须在服务器上重新启动卷本身:

sudo gluster volume stop NAME && sudo gluster volume start NAME

现在,这不是第一个出现此类问题的服务器池。我曾经在另一个服务器集群上遇到过同样的问题。无法解决,所以不得不放弃 gluster。

据我所知: - glusterfs 断开连接时服务器没有失去连接 - 服务器没有 HDD 问题 - 服务器没有在 glusterfs 上运行任何超密集型应用程序,主要是 nginx 的文件夹共享。

我该如何解决这个问题?谢谢。

相关内容