更新:升级到最新版本 5.2,从而更新日志。但是,问题依然存在。更新 2:也将客户端更新到 5.2,问题依然存在。
我有一个包含 3 个节点的 gluster 集群设置。
- 服务器1,192.168.100.1
- 服务器2,192.168.100.2
- 服务器3,192.168.100.3
它们通过内部网络 192.160.100.0/24 连接。但是,我想从网络外部连接客户端使用公共 IP其中一台服务器无法正常工作:
sudo mount -t glusterfs x.x.x.x:/datavol /mnt/gluster/
在日志中给出类似这样的内容:
[2018-12-15 17:57:29.666819] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2018-12-15 18:23:47.892343] I [fuse-bridge.c:4259:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2018-12-15 18:23:47.892375] I [fuse-bridge.c:4870:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-12-15 18:23:47.892475] I [MSGID: 108006] [afr-common.c:5650:afr_local_init] 0-datavol-replicate-0: no subvolumes up
[2018-12-15 18:23:47.892533] E [fuse-bridge.c:4328:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-12-15 18:23:47.892651] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
[2018-12-15 18:23:47.892668] W [fuse-bridge.c:3250:fuse_statfs_resume] 0-glusterfs-fuse: 2: STATFS (00000000-0000-0000-0000-000000000001) resolution fail
[2018-12-15 18:23:47.892773] W [fuse-bridge.c:889:fuse_attr_cbk] 0-glusterfs-fuse: 3: LOOKUP() / => -1 (Transport endpoint is not connected)
[2018-12-15 18:23:47.894204] W [fuse-bridge.c:889:fuse_attr_cbk] 0-glusterfs-fuse: 4: LOOKUP() / => -1 (Transport endpoint is not connected)
[2018-12-15 18:23:47.894367] W [fuse-bridge.c:889:fuse_attr_cbk] 0-glusterfs-fuse: 5: LOOKUP() / => -1 (Transport endpoint is not connected)
[2018-12-15 18:23:47.916333] I [fuse-bridge.c:5134:fuse_thread_proc] 0-fuse: initating unmount of /mnt/gluster
The message "I [MSGID: 108006] [afr-common.c:5650:afr_local_init] 0-datavol-replicate-0: no subvolumes up" repeated 4 times between [2018-12-15 18:23:47.892475] and [2018-12-15 18:23:47.894347]
[2018-12-15 18:23:47.916555] W [glusterfsd.c:1481:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f90f2306494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xfd) [0x5591a51e87ed] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x5591a51e8644] ) 0-: received signum (15), shutting down
[2018-12-15 18:23:47.916573] I [fuse-bridge.c:5897:fini] 0-fuse: Unmounting '/mnt/gluster'.
[2018-12-15 18:23:47.916582] I [fuse-bridge.c:5902:fini] 0-fuse: Closing fuse connection to '/mnt/gluster'.
我所看到的
0-datavol-replicate-0: no subvolumes up
和
0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
公网接口上开放防火墙端口(24007-24008、49152-49156)。
gluster 卷修复 datavol 信息
Brick 192.168.100.1:/data/gluster/brick1
Status: Connected
Number of entries: 0
Brick 192.168.100.2:/data/gluster/brick1
Status: Connected
Number of entries: 0
Brick 192.168.100.3:/data/gluster/brick1
Status: Connected
Number of entries: 0
集群信息:
1: volume datavol-client-0
2: type protocol/client
3: option ping-timeout 42
4: option remote-host 192.168.100.1
5: option remote-subvolume /data/gluster/brick1
6: option transport-type socket
7: option transport.address-family inet
8: option send-gids true
9: end-volume
10:
11: volume datavol-client-1
12: type protocol/client
13: option ping-timeout 42
14: option remote-host 192.168.100.2
15: option remote-subvolume /data/gluster/brick1
16: option transport-type socket
17: option transport.address-family inet
18: option send-gids true
19: end-volume
20:
21: volume datavol-client-2
22: type protocol/client
23: option ping-timeout 42
24: option remote-host 192.168.100.3
25: option remote-subvolume /data/gluster/brick1
26: option transport-type socket
27: option transport.address-family inet
28: option send-gids true
29: end-volume
30:
31: volume datavol-replicate-0
32: type cluster/replicate
33: subvolumes datavol-client-0 datavol-client-1 datavol-client-2
34: end-volume
35:
36: volume datavol-dht
37: type cluster/distribute
38: option lock-migration off
39: subvolumes datavol-replicate-0
40: end-volume
41:
42: volume datavol-write-behind
43: type performance/write-behind
44: subvolumes datavol-dht
45: end-volume
46:
47: volume datavol-read-ahead
48: type performance/read-ahead
49: subvolumes datavol-write-behind
50: end-volume
51:
52: volume datavol-readdir-ahead
53: type performance/readdir-ahead
54: subvolumes datavol-read-ahead
55: end-volume
56:
57: volume datavol-io-cache
58: type performance/io-cache
59: subvolumes datavol-readdir-ahead
60: end-volume
61:
62: volume datavol-quick-read
63: type performance/quick-read
64: subvolumes datavol-io-cache
65: end-volume
66:
67: volume datavol-open-behind
68: type performance/open-behind
69: subvolumes datavol-quick-read
70: end-volume
71:
72: volume datavol-md-cache
73: type performance/md-cache
74: subvolumes datavol-open-behind
75: end-volume
76:
77: volume datavol
78: type debug/io-stats
79: option log-level INFO
80: option latency-measurement off
81: option count-fop-hits off
82: subvolumes datavol-md-cache
83: end-volume
84:
85: volume meta-autoload
86: type meta
87: subvolumes datavol
88: end-volume
gluster 对等状态:
root@server1 /data # gluster peer status
Number of Peers: 2
Hostname: 192.168.100.2
Uuid: 0cb2383e-906d-4ca6-97ed-291b04b4fd10
State: Peer in Cluster (Connected)
Hostname: 192.168.100.3
Uuid: d2d9e82f-2fb6-4f27-8fd0-08aaa8409fa9
State: Peer in Cluster (Connected)
gluster 卷状态
Status of volume: datavol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.100.1:/data/gluster/brick1 49152 0 Y 13519
Brick 192.168.100.2:/data/gluster/brick1 49152 0 Y 30943
Brick 192.168.100.3:/data/gluster/brick1 49152 0 Y 24616
Self-heal Daemon on localhost N/A N/A Y 3282
Self-heal Daemon on 192.168.100.2 N/A N/A Y 18987
Self-heal Daemon on 192.168.100.3 N/A N/A Y 24638
Task Status of Volume datavol
我错过了什么?
答案1
我也有同样的问题。
你有没有看到?https://bugzilla.redhat.com/show_bug.cgi?id=1659824
在 GlusterFS 中使用“IP”似乎并不“好”,因为客户端依赖于来自服务器的卷信息中的远程主机地址。如果服务器无法到达足够的 Gluster 节点,则无法使用其他节点的卷信息。请参阅https://unix.stackexchange.com/questions/213705/glusterfs-how-to-failover-smartly-if-a-mounted-server-is-failed
因此,问题是,挂载点到达节点 1,读取卷信息(请参阅/var/log/glusterfs/<volume>.log
)。有关其他节点的信息在option remote-host
)。然后,客户端尝试通过私有 IP 连接到该节点 - 并且失败(就我而言)。我假设,您的公共客户端无法访问私有 IP - 这就是背后的问题Transport endpoint is not connected
。
解决方案 A - 在 Gluster 集群中使用主机名而不是 IP 是可行的,因为您可以在其中/etc/hosts
为所有服务器创建别名。但这意味着 - 必须重建 Gluster 以使用 DNS 名称(即 Gluster 节点内的 192-IP 和客户端上的公共 IP)。我没有尝试从基于 IP 的 Gluster 切换到基于 DNS 的 Gluster(尤其是在生产中?)。
RH bugzilla 中的解决方案 B 对我来说不太清楚。我不明白应该包含什么glusterfs -f$local-volfile $mountpoint
- 尤其是忽略的真正挂载点选项是什么remote-host
以及它们与 vol-file 的含义是什么。SE 上的第二篇帖子中有回复。我想,这就是答案,但我还没有测试过。
所以 - 我认为,这不是一个错误,而是一个文档空白。构建卷的信息(砖块主机名)在客户端内部用于连接到挂载点选项中指定的其他节点。