修复 MySQL group_replication applier 模块启动失败问题

修复 MySQL group_replication applier 模块启动失败问题

总结:这个问题与主从复制无关。它与多主组复制的新功能有关(从 v5.7.17 开始可用)。我们有可行的解决方法,但由于需要两次导入大量数据,因此需要数小时,因此问题在于找到正确的方法来解决这个问题。

我们已经使用(仅可用 5.7.17 及更高版本)设置了生产 MySQLmulti-master集群。group replication

但是,当向该集群添加新节点时,我收到错误ERROR 3094 (HY000): The START GROUP_REPLICATION command failed as the applier module failed to start.

尝试使用解决方案[1][2]在设置传统主从复制时使用类似错误不工作。我发现唯一可行的解​​决方案是运行reset + source两次命令。这需要几个小时才能完成(数据库非常大)。请参阅下面的详细信息。

添加新节点的过程:

On existing node:
# mysqldump --all-databases --triggers --routines --events -u root -p > /home/user/dump.sql
Copy dump to new node
Setup new node completely clean with group replication config (see config example below)
mysql> SET SQL_LOG_BIN=0;
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'xxxxxxxx' REQUIRE SSL;
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
mysql> FLUSH PRIVILEGES;
mysql> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='xxxxxxxx' FOR CHANNEL 'group_replication_recovery';
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
mysql> SET SQL_LOG_BIN=1; 
mysql> RESET MASTER; #This is required because the server on first start will create new GTID which does not exist on other nodes
mysql> SOURCE /home/user/dump.sql;
mysql> START GROUP_REPLICATION;

在错误日志中我可以看到:

2017-07-19T04:00:24.452539Z 7 [ERROR] Failed to open the relay log './hostname1-relay-bin-group_replication_applier.000001' (relay_log_pos 4).
2017-07-19T04:00:24.452549Z 7 [ERROR] Could not find target log file mentioned in relay log info in the index file './hostname1-relay-bin-group_replication_applier.index' during relay log initialization.
2017-07-19T04:00:24.454935Z 7 [ERROR] Plugin group_replication reported: 'Failed to setup the group replication applier thread.'
2017-07-19T04:00:24.454957Z 7 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2017-07-19T04:00:24.454986Z 4 [ERROR] Plugin group_replication reported: 'Unable to initialize the Group Replication applier module.'
2017-07-19T04:00:24.455347Z 4 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2017-07-19T04:00:24.455361Z 4 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2017-07-19T04:00:24.455372Z 0 [Note] Plugin group_replication reported: 'Destroying SSL'
2017-07-19T04:00:24.455380Z 0 [Note] Plugin group_replication reported: 'Success destroying SSL'

示例配置:

[mysqld]
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mysqld/mysqld.pid

log-error = /var/log/mysqld/error.log
symbolic-links = 0
skip_name_resolve

############# GROUP REPLICATION ###############
# General replication settings
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
log_bin = mysql-bin
relay_log = relay-bin
binlog_format = ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_bootstrap_group = OFF
loose-group_replication_start_on_boot = ON
loose-group_replication_ssl_mode = REQUIRED
loose-group_replication_recovery_use_ssl = 1
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
group_replication_auto_increment_increment = 3
auto_increment_increment = 3

# Shared replication group configuration
loose-group_replication_group_name = "84c75469-8959-4d4a-beb5-7753684a5161"
loose-group_replication_ip_whitelist = "192.168.25.0/24,127.0.0.0/8"
loose-group_replication_group_seeds = "192.168.25.2:13306,192.168.25.3:13306,192.168.25.4:13306"

# Host specific replication configuration
server_id = 10
bind-address = *
report_host = "192.168.25.5"
loose-group_replication_local_address = "192.168.25.5:13306"
##########  END GROUP REPLICATION ##########

答案1

由于loose-group_replication_start_on_boot在启动时启用。在这种情况下,您可能需要先停止复制,然后启动复制。在您的设置中尝试以下步骤,并让我知道它是否有帮助。此外,请在此过程中发布完整的错误:

从另一个 MASTER 节点获取数据库备份并在当前节点上恢复。然后运行以下命令:

mysql> SET SQL_LOG_BIN=0;
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'xxxxxxxx' REQUIRE SSL;
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
mysql> FLUSH PRIVILEGES;
mysql> CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='xxxxxxxx' FOR        CHANNEL 'group_replication_recovery';
mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so';
mysql> SET SQL_LOG_BIN=1; 
mysql> STOP GROUP_REPLICATION;
mysql> START GROUP_REPLICATION;

此后,新节点应与其他主节点同步。让我知道您的进展如何。

相关内容