我正在使用一个 GlusterFS 集群,该集群有一个由 4 个对等点组成的可信存储池。
- 示例产品(100.100.250.197)
- 示例存储 1 (100.100.248.178)
- 示例存储2 (100.100.250.25)
- 示例存储3(100.100.255.40)
它工作正常(=卷可以安装,文件被正确存储),但有一件事除外:没有进行重新平衡。
另外,输出peer status
和登录glus-glusterfs-glusterd.vol.log
也令人担忧。有些事情出错了,我不知道如何修复。
我担心有一天整个系统会崩溃,我会丢失所有数据。所以我认为我需要解决这些问题
所有服务器都具有 gluster 3.7.6 并且运行 Ubuntu 16.04。
输出卷状态
gluster> volume status
Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick example-storage1:/data/brick1/gv0 49152 0 Y 1413
Brick 100.100.250.25:/data/brick2/gv0 49152 0 Y 3081
Brick 100.100.255.40:/data/brick3/gv0 N/A N/A N N/A
NFS Server on localhost N/A N/A N N/A
NFS Server on example-storage2 N/A N/A N N/A
NFS Server on example-storage1.example.com
2049 0 Y 24490
NFS Server on example-storage3 N/A N/A N N/A
Task Status of Volume gv0
------------------------------------------------------------------------------
Task : Rebalance
ID : 1ee56040-6bb5-4407-8ae5-f176e6c89db1
Status : completed
输出同侪地位
在 example-prod 上
gluster> peer status
Number of Peers: 3
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2.example.com
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1.example.com
在 example-storage1 上
Number of Peers: 5
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Hostname: example-storage3
Uuid: 49d9bc0a-b67d-4850-bff9-edeaa0dac8ca
State: Peer Rejected (Connected)
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod
在 example-storage2 上
Number of Peers: 3
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1
Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod
在 example-storage3 上
注意“断开连接状态”
Number of Peers: 3
Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Disconnected)
Other names:
example-prod
Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Disconnected)
Other names:
example-storage1
Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Disconnected)
Other names:
example-storage2
glus-glusterfs-glusterd.vol.log 的输出
在 example-prod 上
// every 5 seconds the following line
[2018-04-13 07:07:05.602742] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)
[2018-04-13 07:07:08.603156] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)
在 example-storage1 上
// every 5 seconds the following line
[2018-04-13 07:00:38.987432] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:00:41.987968] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
在 example-storage2 上
// every 5 seconds the following line
[2018-04-13 07:08:24.119264] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:08:27.119618] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
在 example-storage3 上
// The following lines repeat
[2018-04-13 07:07:54.599955] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2018-04-13 07:07:54.600003] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2018-04-13 07:08:02.697437] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <example-storage2> (<54566d17-f76b-45d0-82a2-ed8a474289c8>), in state <Peer in Cluster>, has disconnected from glusterd.
[2018-04-13 07:08:04.625465] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid argument
[2018-04-13 07:08:04.625513] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument