gluster 重新平衡失败

2024-5-31 • tag-icon

gluster 重新平衡在重新平衡后失败，并且在运行重新平衡后砖块也下降。输出和日志如下：

$ gluster --mode=script --wignore 卷状态

Status of volume: patchy
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick myhost:/d/backends/patchy1          49152     0          Y       64813
Brick myhost:/d/backends/patchy2          49153     0          Y       64834

Task Status of Volume patchy
------------------------------------------------------------------------------
There are no active volume tasks

$ gluster --mode=script --wignore 卷设置 patchy cluster.weighted-rebalance off

$ gluster --mode=script --wignore 卷重新平衡不完整启动力

volume rebalance: patchy: success: Rebalance on patchy has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 5c573761-314d-4294-99ba-c6a518675e26

$ gluster --mode=script --wignore 卷重新平衡不完整状态

                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes             0             3             0               failed        0:00:00
volume rebalance: patchy: success

$ gluster --mode=script --wignore 卷状态

    Status of volume: patchy
    Gluster process                             TCP Port  RDMA Port  Online  Pid
    ------------------------------------------------------------------------------
    Brick myhost:/d/backends/patchy1          49152     0          Y       64813
    Brick myhost:/d/backends/patchy2          N/A       N/A        N       N/A

    Task Status of Volume patchy

------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 5c573761-314d-4294-99ba-c6a518675e26
Status               : failed

卷重新平衡日志如下

$ cat /var/log/glusterfs/patchy-rebalance.log

[2018-03-23 08:06:34.303638] I [MSGID: 100030] [glusterfsd.c:2625:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 4.0.1 (args: /usr/local/sbin/glusterfs -s localhost --volfile-id rebalance/patchy --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on --process-name rebalance --xlator-option *dht.rebalance-cmd=5 --xlator-option *dht.node-uuid=88559a30-a606-4af0-beb6-458cfafa8df6 --xlator-option *dht.commit-hash=3584404562 --socket-file /var/run/gluster/gluster-rebalance-2f34d12e-1e62-4737-8eec-b14b75ae3500.sock --pid-file /var/lib/glusterd/vols/patchy/rebalance/88559a30-a606-4af0-beb6-458cfafa8df6.pid -l /var/log/glusterfs/patchy-rebalance.log)
[2018-03-23 08:06:34.316882] I [MSGID: 101190] [event-epoll.c:609:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-03-23 08:06:39.304868] I [MSGID: 109104] [dht-shared.c:710:dht_init] 0-patchy-dht: dht_init using commit hash 3584404562
[2018-03-23 08:06:39.305817] I [MSGID: 101190] [event-epoll.c:609:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-03-23 08:06:39.307050] I [MSGID: 114020] [client.c:2300:notify] 0-patchy-client-0: parent translators are ready, attempting connect on transport
[2018-03-23 08:06:39.307634] I [MSGID: 114020] [client.c:2300:notify] 0-patchy-client-1: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume patchy-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host myhost
  5:     option remote-subvolume /d/backends/patchy1
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 288e00ca-26be-4a99-9e33-ea1b174ef347
  9:     option password 626b1311-2be8-4c6c-97f3-76a33c4a48e5
 10:     option transport.tcp-user-timeout 0
 11:     option transport.socket.keepalive-time 20
 12:     option transport.socket.keepalive-interval 2
 13:     option transport.socket.keepalive-count 9
 14: end-volume
 15:
 16: volume patchy-client-1
 17:     type protocol/client
 18:     option ping-timeout 42
 19:     option remote-host myhost
 20:     option remote-subvolume /d/backends/patchy2
 21:     option transport-type socket
 22:     option transport.address-family inet
 23:     option username 288e00ca-26be-4a99-9e33-ea1b174ef347
 24:     option password 626b1311-2be8-4c6c-97f3-76a33c4a48e5
 25:     option transport.tcp-user-timeout 0
 26:     option transport.socket.keepalive-time 20
 27:     option transport.socket.keepalive-interval 2
 28:     option transport.socket.keepalive-count 9
 29: end-volume
 30:
 31: volume patchy-dht
 32:     type cluster/distribute
 33:     option use-readdirp yes
 34:     option lookup-unhashed yes
 35:     option assert-no-child-down yes
 36:     option readdir-optimize on
 37:     option rebalance-cmd 5
 38:     option node-uuid 88559a30-a606-4af0-beb6-458cfafa8df6
 39:     option commit-hash 3584404562
 40:     option lock-migration off
 41:     option force-migration off
 42:     option weighted-rebalance off
 43:     subvolumes patchy-client-0 patchy-client-1
 44: end-volume
 45:
 46: volume patchy
 47:     type debug/io-stats
 48:     option log-level INFO
 49:     option latency-measurement off
 50:     option count-fop-hits off
 51:     subvolumes patchy-dht
 52: end-volume
 53:
+------------------------------------------------------------------------------+
[2018-03-23 08:06:39.308435] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-1: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.308538] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-0: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.308607] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-1: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.308838] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-0: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.308867] I [rpc-clnt.c:2071:rpc_clnt_reconfig] 0-patchy-client-1: changing port to 49153 (from 0)
[2018-03-23 08:06:39.309138] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-1: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.309222] I [rpc-clnt.c:2071:rpc_clnt_reconfig] 0-patchy-client-0: changing port to 49152 (from 0)
[2018-03-23 08:06:39.309264] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-1: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.309531] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-0: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.309747] W [rpc-clnt.c:1739:rpc_clnt_submit] 0-patchy-client-0: error returned while attempting to connect to host:(null), port:0
[2018-03-23 08:06:39.310115] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-patchy-client-1: Connected to patchy-client-1, attached to remote volume '/d/backends/patchy2'.
[2018-03-23 08:06:39.310264] I [MSGID: 114046] [client-handshake.c:1176:client_setvolume_cbk] 0-patchy-client-0: Connected to patchy-client-0, attached to remote volume '/d/backends/patchy1'.
[2018-03-23 08:06:39.315151] I [MSGID: 109005] [dht-selfheal.c:2328:dht_selfheal_directory] 0-patchy-dht: Directory selfheal failed: Unable to form layout for directory /
[2018-03-23 08:06:39.315302] I [dht-rebalance.c:4513:gf_defrag_start_crawl] 0-patchy-dht: gf_defrag_start_crawl using commit hash 3584404562
[2018-03-23 08:06:39.315689] I [MSGID: 109081] [dht-common.c:5602:dht_setxattr] 0-patchy-dht: fixing the layout of /
[2018-03-23 08:06:39.317123] E [MSGID: 109039] [dht-common.c:4057:dht_find_local_subvol_cbk] 0-patchy-dht: getxattr err for dir [No data available]
[2018-03-23 08:06:39.317201] E [MSGID: 109039] [dht-common.c:4057:dht_find_local_subvol_cbk] 0-patchy-dht: getxattr err for dir [No data available]
[2018-03-23 08:06:39.317434] I [MSGID: 0] [dht-rebalance.c:4585:gf_defrag_start_crawl] 0-patchy-dht: local subvols are patchy-client-1
[2018-03-23 08:06:39.317455] I [MSGID: 0] [dht-rebalance.c:4591:gf_defrag_start_crawl] 0-patchy-dht: node uuids are 88559a30-a606-4af0-beb6-458cfafa8df6
[2018-03-23 08:06:39.317462] I [MSGID: 0] [dht-rebalance.c:4585:gf_defrag_start_crawl] 0-patchy-dht: local subvols are patchy-client-0
[2018-03-23 08:06:39.317469] I [MSGID: 0] [dht-rebalance.c:4591:gf_defrag_start_crawl] 0-patchy-dht: node uuids are 88559a30-a606-4af0-beb6-458cfafa8df6
[2018-03-23 08:06:39.317601] I [MSGID: 0] [dht-rebalance.c:4271:gf_defrag_total_file_size] 0-patchy-dht: local subvol: patchy-client-1,cnt = 6119424
[2018-03-23 08:06:39.317730] I [MSGID: 0] [dht-rebalance.c:4271:gf_defrag_total_file_size] 0-patchy-dht: local subvol: patchy-client-0,cnt = 3149824
[2018-03-23 08:06:39.317739] I [MSGID: 0] [dht-rebalance.c:4275:gf_defrag_total_file_size] 0-patchy-dht: Total size files = 9269248
[2018-03-23 08:06:39.317866] I [MSGID: 0] [dht-rebalance.c:4300:gf_defrag_total_file_cnt] 0-patchy-dht: local subvol: patchy-client-1,cnt = 1570
[2018-03-23 08:06:39.318020] I [MSGID: 0] [dht-rebalance.c:4300:gf_defrag_total_file_cnt] 0-patchy-dht: local subvol: patchy-client-0,cnt = 897
[2018-03-23 08:06:39.318029] I [MSGID: 0] [dht-rebalance.c:4311:gf_defrag_total_file_cnt] 0-patchy-dht: Total number of files = 1233
[2018-03-23 08:06:39.318148] I [dht-rebalance.c:4667:gf_defrag_start_crawl] 0-DHT: Thread[0] creation successful
[2018-03-23 08:06:39.318323] I [dht-rebalance.c:4667:gf_defrag_start_crawl] 0-DHT: Thread[1] creation successful
[2018-03-23 08:06:39.318360] I [dht-rebalance.c:4667:gf_defrag_start_crawl] 0-DHT: Thread[2] creation successful
[2018-03-23 08:06:39.318436] I [dht-rebalance.c:4667:gf_defrag_start_crawl] 0-DHT: Thread[3] creation successful
[2018-03-23 08:06:39.377769] I [MSGID: 109081] [dht-common.c:5602:dht_setxattr] 0-patchy-dht: fixing the layout of /dir
[2018-03-23 08:06:39.378756] I [dht-rebalance.c:3274:gf_defrag_process_dir] 0-patchy-dht: migrate data called on /dir
[2018-03-23 08:06:39.689322] W [socket.c:592:__socket_rwv] 0-patchy-client-1: readv on 0.0.0.0:49153 failed (No data available)
[2018-03-23 08:06:39.689360] I [MSGID: 114018] [client.c:2227:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available
[2018-03-23 08:06:39.689381] W [MSGID: 109073] [dht-common.c:10557:dht_notify] 0-patchy-dht: Received CHILD_DOWN. Exiting
[2018-03-23 08:06:39.689391] I [MSGID: 109029] [dht-rebalance.c:5327:gf_defrag_stop] 0-: Received stop command on rebalance
[2018-03-23 08:06:39.689573] E [rpc-clnt.c:350:saved_frames_unwind] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x15a)[0x3ff7e1294a2] (--> /usr/local/lib/libgfrpc.so.0(+0xdb1e)[0x3ff7e08db1e] (--> /usr/local/lib/libgfrpc.so.0(+0xdc8c)[0x3ff7e08dc8c] (--> /usr/local/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x98)[0x3ff7e08f408] (--> /usr/local/lib/libgfrpc.so.0(+0xfffc)[0x3ff7e08fffc] ))))) 0-patchy-client-1: forced unwinding frame type(GlusterFS 4.x v1) op(READDIRP(40)) called at 2018-03-23 08:06:39.379300 (xid=0x22)
[2018-03-23 08:06:39.689588] W [MSGID: 114031] [client-rpc-fops_v2.c:2264:client4_0_readdirp_cbk] 0-patchy-client-1: remote operation failed [Transport endpoint is not connected]
[2018-03-23 08:06:39.689635] W [MSGID: 109021] [dht-rebalance.c:3106:gf_defrag_get_entry] 0-patchy-dht: Readdirp failed. Aborting data migration for directory: /dir [Transport endpoint is not connected]
[2018-03-23 08:06:39.689655] W [dht-rebalance.c:3448:gf_defrag_process_dir] 0-patchy-dht: Found error from gf_defrag_get_entry
[2018-03-23 08:06:39.689714] E [MSGID: 109111] [dht-rebalance.c:3962:gf_defrag_fix_layout] 0-patchy-dht: gf_defrag_process_dir failed for directory: /dir
[2018-03-23 08:06:39.690982] W [MSGID: 114061] [client-common.c:3375:client_pre_readdirp_v2] 0-patchy-client-1:  (00000000-0000-0000-0000-000000000001) remote_fd is -1. EBADFD [File descriptor in bad state]
[2018-03-23 08:06:39.691056] I [MSGID: 109081] [dht-common.c:5602:dht_setxattr] 0-patchy-dht: fixing the layout of /
[2018-03-23 08:06:39.691419] E [MSGID: 114031] [client-rpc-fops_v2.c:1306:client4_0_inodelk_cbk] 0-patchy-client-1: remote operation failed [Transport endpoint is not connected]
[2018-03-23 08:06:39.691436] E [MSGID: 109119] [dht-lock.c:1051:dht_blocking_inodelk_cbk] 0-patchy-dht: inodelk failed on subvol patchy-client-1, gfid:00000000-0000-0000-0000-000000000001 [Transport endpoint is not connected]
[2018-03-23 08:06:39.691623] E [MSGID: 109016] [dht-rebalance.c:3934:gf_defrag_fix_layout] 0-patchy-dht: Setxattr failed for / [Transport endpoint is not connected]
[2018-03-23 08:06:39.691636] I [dht-rebalance.c:3274:gf_defrag_process_dir] 0-patchy-dht: migrate data called on /
[2018-03-23 08:06:39.691653] E [MSGID: 114031] [client-rpc-fops_v2.c:2451:client4_0_opendir_cbk] 0-patchy-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]
[2018-03-23 08:06:39.691812] W [dht-rebalance.c:3448:gf_defrag_process_dir] 0-patchy-dht: Found error from gf_defrag_get_entry
[2018-03-23 08:06:39.691844] E [MSGID: 109111] [dht-rebalance.c:3962:gf_defrag_fix_layout] 0-patchy-dht: gf_defrag_process_dir failed for directory: /
[2018-03-23 08:06:39.691862] I [dht-rebalance.c:4716:gf_defrag_start_crawl] 0-DHT: crawling file-system completed
[2018-03-23 08:06:39.692135] I [MSGID: 109028] [dht-rebalance.c:5141:gf_defrag_status_get] 0-patchy-dht: Rebalance is failed. Time taken is 0.00 secs
[2018-03-23 08:06:39.692144] I [MSGID: 109028] [dht-rebalance.c:5145:gf_defrag_status_get] 0-patchy-dht: Files migrated: 0, size: 0, lookups: 0, failures: 3, skipped: 0
[2018-03-23 08:06:39.692230] W [glusterfsd.c:1424:cleanup_and_exit] (-->/lib/s390x-linux-gnu/libpthread.so.0(+0x7934) [0x3ff7de87934] -->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0x110) [0x12e00b6b0] -->/usr/local/sbin/glusterfs(cleanup_and_exit+0x74) [0x12e00b494] ) 0-: received signum (15), shutting down

相关内容