修复 GlusterFS 似乎不起作用?

修复 GlusterFS 似乎不起作用?

我最近更换了 GlusterFS 集群中提供砖块的 HDD 之一。我能够将该 HDD 映射回砖块,然后让 GlusterFS 成功复制到它。

但是,整个过程中有一个问题似乎对我不起作用。我尝试在更换了砖块的卷上运行“修复”命令,但会不断遇到此问题:

$ gluster volume heal nova
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.

日志基本上呼应了上述内容,具体来说:

$ tail etc-glusterfs-glusterd.vol.log
[2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
[2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error
[2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
[2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
[2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped

在我尝试上述操作时,其他日志中包含以下消息:

$ ls -ltr
-rw-------   1 root root      41704 Aug  2 12:07 glfsheal-nova.log
-rw-------   1 root root      15986 Aug  2 12:07 cmd_history.log-20150802
-rw-------   1 root root     290359 Aug  3 19:07 var-lib-nova-instances.log
-rw-------   1 root root     221829 Aug  3 19:07 glustershd.log
-rw-------   1 root root     195472 Aug  3 19:07 nfs.log
-rw-------   1 root root   61831116 Aug  3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log
-rw-------   1 root root       3504 Aug  3 19:08 cmd_history.log
-rw-------   1 root root      89294 Aug  3 19:08 cli.log
-rw-------   1 root root     136421 Aug  3 19:08 etc-glusterfs-glusterd.vol.log

浏览它们,不清楚其中是否与这个特定问题相关。

答案1

通过上述设置,我最初认为我只能从 GlusterFS 集群的主节点运行修复命令,但事实证明,我的真正问题在于 GlusterFS 集群内的 11 个节点运行着 2 个不同版本的 GlusterFS。

一旦我意识到这一点,我就将所有节点更新到最新版本的 GlusterFS(3.7.3),并且能够从任何节点执行修复,正如人们所期望的那样。

相关内容