Galera 新集群 WSREP 未知错误 141

Galera 新集群 WSREP 未知错误 141

我正在尝试使用 Ansible(DevOps 工具)在云中创建一个新的 Galera 集群。我可以通过从 SSH shell 发出命令来启动集群,但不能从 Ansible 发出命令。我已将配置简化为两个节点,但无法让第二个节点加入集群。原因似乎是未知错误 141。我找不到有关此错误的任何文档。

失败的原因似乎是:

    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: 0.0 (dev-database1): State transfer to 1.0 (dev-database2) failed: -141 (Unknown error 141)

有人能解释一下导致这个错误的原因吗?

配置是Ubuntu 14.04,MariaDB 10.0.20,每台机器上使用相同的SSL证书进行rsync复制。

这是尝试加入集群失败的日志。在此期间,其他节点上没有日志记录。

    Jul 28 23:43:56 dev-database2 ansible-command: Invoked with creates=None executable=None chdir=None args=service mysql start removes=None NO_LOG=None shell=True warn=True
    Jul 28 23:43:56 dev-database2 mysqld_safe: Starting mysqld daemon with databases from /crypt/mysql
    Jul 28 23:43:56 dev-database2 mysqld_safe: WSREP: Running position recovery with --log_error='/crypt/mysql/wsrep_recovery.YBBBx4' --pid-file='/crypt/mysql/dev-database2-recover.pid'
    Jul 28 23:43:56 dev-database2 mysqld: 150729  9:43:56 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 16430 ...
    Jul 28 23:43:59 dev-database2 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 16470 ...
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Setting wsrep_ready to 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Read nil XID from storage engines, skipping position init
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <[email protected]> loaded successfully.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: CRC-32C: using hardware acceleration.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Passing config to GCS: base_host = 119.9.25.17; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /crypt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /crypt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo =
    Jul 28 23:43:59 dev-database2 mysqld:  false; pc.recovery = 1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: wsrep_sst_grab()
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Start replication
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: protonet asio version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Using CRC-32C for message checksums.
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: initializing ssl context
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: backend: asio
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: restore pc from disk failed
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: GMCast version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: EVS version 0
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: gcomm: connecting to group 'Dev DB Cluster', peer '119.9.23.122:,119.9.25.17:'
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:55693 local endpoint ssl://119.9.25.17:4567 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:4567 local endpoint ssl://119.9.25.17:55693 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Warning] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' points to own listening address, blacklisting
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.23.122:4567 local endpoint ssl://119.9.25.17:38347 cipher: AES128-SHA compression:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' pointing to uuid 84b9a759 is blacklisted, skipping
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: (84b9a759, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: declaring 7ba9e675 at ssl://119.9.23.122:4567 stable
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: Node 7ba9e675 state prim
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: view(view_id(PRIM,7ba9e675,2) memb {
    Jul 28 23:43:59 dev-database2 mysqld: #0117ba9e675,0
    Jul 28 23:43:59 dev-database2 mysqld: #01184b9a759,0
    Jul 28 23:43:59 dev-database2 mysqld: } joined {
    Jul 28 23:43:59 dev-database2 mysqld: } left {
    Jul 28 23:43:59 dev-database2 mysqld: } partitioned {
    Jul 28 23:43:59 dev-database2 mysqld: })
    Jul 28 23:43:59 dev-database2 mysqld: 150729  9:43:59 [Note] WSREP: save pc into disk
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: connected
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Opened channel 'Dev DB Cluster'
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Waiting for SST to complete.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: sent state msg: 850918ce-3582-11e5-bb92-1e01df56834d
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: got state msg: 850918ce-3582-11e5-bb92-1e01df56834d from 0 (dev-database1)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: STATE EXCHANGE: got state msg: 850918ce-3582-11e5-bb92-1e01df56834d from 1 (dev-database2)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Quorum results:
    Jul 28 23:44:00 dev-database2 mysqld: #011version    = 3,
    Jul 28 23:44:00 dev-database2 mysqld: #011component  = PRIMARY,
    Jul 28 23:44:00 dev-database2 mysqld: #011conf_id    = 1,
    Jul 28 23:44:00 dev-database2 mysqld: #011members    = 1/2 (joined/total),
    Jul 28 23:44:00 dev-database2 mysqld: #011act_id     = 0,
    Jul 28 23:44:00 dev-database2 mysqld: #011last_appl. = -1,
    Jul 28 23:44:00 dev-database2 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
    Jul 28 23:44:00 dev-database2 mysqld: #011group UUID = ed66210d-3581-11e5-b320-9e349b557133
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Flow-control interval: [23, 23]
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: State transfer required:
    Jul 28 23:44:00 dev-database2 mysqld: #011Group state: ed66210d-3581-11e5-b320-9e349b557133:0
    Jul 28 23:44:00 dev-database2 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: New cluster view: global state: ed66210d-3581-11e5-b320-9e349b557133:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: Gap in state sequence. Need state transfer.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Setting wsrep_ready to 0
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '119.9.25.17' --auth '' --datadir '/crypt/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '16470' --binlog '/var/log/mysql/mariadb-bin' '
    Jul 28 23:44:00 dev-database2 rsyncd[16521]: rsyncd version 3.1.0 starting, listening on port 4444
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Prepared SST request: rsync|119.9.25.17:4444/rsync_sst
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: REPL Protocols: 7 (3, 2)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Service thread queue flushed.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ed66210d-3581-11e5-b320-9e349b557133): 1 (Operation not permitted)
    Jul 28 23:44:00 dev-database2 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Member 1.0 (dev-database2) requested state transfer from '*any*'. Selected 0.0 (dev-database1)(SYNCED) as donor.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: Requesting state transfer: success, donor: 0
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Warning] WSREP: 0.0 (dev-database1): State transfer to 1.0 (dev-database2) failed: -141 (Unknown error 141)
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: terminating thread
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: joining thread
    Jul 28 23:44:00 dev-database2 mysqld: 150729  9:44:00 [Note] WSREP: gcomm: closing backend
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: view(view_id(NON_PRIM,7ba9e675,2) memb {
    Jul 28 23:44:01 dev-database2 mysqld: #01184b9a759,0
    Jul 28 23:44:01 dev-database2 mysqld: } joined {
    Jul 28 23:44:01 dev-database2 mysqld: } left {
    Jul 28 23:44:01 dev-database2 mysqld: } partitioned {
    Jul 28 23:44:01 dev-database2 mysqld: #0117ba9e675,0
    Jul 28 23:44:01 dev-database2 mysqld: })
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: view((empty))
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: gcomm: closed
    Jul 28 23:44:01 dev-database2 mysqld: 150729  9:44:01 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [ERROR] Parent mysqld process (PID:16470) terminated unexpectedly. (20150729 09:44:02.440)
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [INFO] Joiner cleanup. (20150729 09:44:02.442)
    Jul 28 23:44:02 dev-database2 rsyncd[16521]: sent 0 bytes  received 0 bytes  total size 0
    Jul 28 23:44:02 dev-database2 mysqld: WSREP_SST: [INFO] Joiner cleanup done. (20150729 09:44:02.948)
    Jul 28 23:44:02 dev-database2 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")'
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
    Jul 28 23:44:28 dev-database2 /etc/init.d/mysql[16869]:

这是成功加入后的日志。完成上述操作后,我所做的就是停止 mysql 服务器,然后在另一个节点上运行 sudo service mysql start --wsrep-new-cluster,并在此节点上运行 sudo service mysql start。

    Jul 29 00:08:02 dev-database2 mysqld_safe: Starting mysqld daemon with databases from /crypt/mysql
    Jul 29 00:08:02 dev-database2 mysqld_safe: WSREP: Running position recovery with --log_error='/crypt/mysql/wsrep_recovery.b5BLaP' --pid-file='/crypt/mysql/dev-database2-recover.pid'
    Jul 29 00:08:02 dev-database2 mysqld: 150729 10:08:02 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 19162 ...
    Jul 29 00:08:05 dev-database2 mysqld_safe: WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-1~trusty-wsrep-log) starting as process 19203 ...
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Setting wsrep_ready to 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Read nil XID from storage engines, skipping position init
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <[email protected]> loaded successfully.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: CRC-32C: using hardware acceleration.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Passing config to GCS: base_host = 119.9.25.17; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /crypt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /crypt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo =
    Jul 29 00:08:05 dev-database2 mysqld:  false; pc.recovery = 1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: wsrep_sst_grab()
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Start replication
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: protonet asio version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Using CRC-32C for message checksums.
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: initializing ssl context
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: backend: asio
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: restore pc from disk failed
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: GMCast version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') multicast: , ttl: 1
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: EVS version 0
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: gcomm: connecting to group 'Dev DB Cluster', peer '119.9.23.122:,119.9.25.17:'
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:55697 local endpoint ssl://119.9.25.17:4567 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.25.17:4567 local endpoint ssl://119.9.25.17:55697 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Warning] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' points to own listening address, blacklisting
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: SSL handshake successful, remote endpoint ssl://119.9.23.122:4567 local endpoint ssl://119.9.25.17:38351 cipher: AES128-SHA compression:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') address 'ssl://119.9.25.17:4567' pointing to uuid e28b5f7d is blacklisted, skipping
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: declaring cab39946 at ssl://119.9.23.122:4567 stable
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: Node cab39946 state prim
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: view(view_id(PRIM,cab39946,2) memb {
    Jul 29 00:08:05 dev-database2 mysqld: #011cab39946,0
    Jul 29 00:08:05 dev-database2 mysqld: #011e28b5f7d,0
    Jul 29 00:08:05 dev-database2 mysqld: } joined {
    Jul 29 00:08:05 dev-database2 mysqld: } left {
    Jul 29 00:08:05 dev-database2 mysqld: } partitioned {
    Jul 29 00:08:05 dev-database2 mysqld: })
    Jul 29 00:08:05 dev-database2 mysqld: 150729 10:08:05 [Note] WSREP: save pc into disk
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: gcomm: connected
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Opened channel 'Dev DB Cluster'
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Waiting for SST to complete.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: sent state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: got state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048 from 0 (dev-database1)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: STATE EXCHANGE: got state msg: e2de5bf6-3585-11e5-bb24-8ac7dfc1b048 from 1 (dev-database2)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Quorum results:
    Jul 29 00:08:06 dev-database2 mysqld: #011version    = 3,
    Jul 29 00:08:06 dev-database2 mysqld: #011component  = PRIMARY,
    Jul 29 00:08:06 dev-database2 mysqld: #011conf_id    = 1,
    Jul 29 00:08:06 dev-database2 mysqld: #011members    = 1/2 (joined/total),
    Jul 29 00:08:06 dev-database2 mysqld: #011act_id     = 0,
    Jul 29 00:08:06 dev-database2 mysqld: #011last_appl. = -1,
    Jul 29 00:08:06 dev-database2 mysqld: #011protocols  = 0/7/3 (gcs/repl/appl),
    Jul 29 00:08:06 dev-database2 mysqld: #011group UUID = ed66210d-3581-11e5-b320-9e349b557133
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Flow-control interval: [23, 23]
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: State transfer required:
    Jul 29 00:08:06 dev-database2 mysqld: #011Group state: ed66210d-3581-11e5-b320-9e349b557133:0
    Jul 29 00:08:06 dev-database2 mysqld: #011Local state: 00000000-0000-0000-0000-000000000000:-1
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: New cluster view: global state: ed66210d-3581-11e5-b320-9e349b557133:0, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 3
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Warning] WSREP: Gap in state sequence. Need state transfer.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Setting wsrep_ready to 0
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '119.9.25.17' --auth '' --datadir '/crypt/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '19203' --binlog '/var/log/mysql/mariadb-bin' '
    Jul 29 00:08:06 dev-database2 rsyncd[19253]: rsyncd version 3.1.0 starting, listening on port 4444
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Prepared SST request: rsync|119.9.25.17:4444/rsync_sst
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: REPL Protocols: 7 (3, 2)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Service thread queue flushed.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ed66210d-3581-11e5-b320-9e349b557133): 1 (Operation not permitted)
    Jul 29 00:08:06 dev-database2 mysqld: #011 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Member 1.0 (dev-database2) requested state transfer from '*any*'. Selected 0.0 (dev-database1)(SYNCED) as donor.
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
    Jul 29 00:08:06 dev-database2 mysqld: 150729 10:08:06 [Note] WSREP: Requesting state transfer: success, donor: 0
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: name lookup failed for 119.9.23.122: Name or service not known
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: connect from UNKNOWN (119.9.23.122)
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: rsync to rsync_sst/ from UNKNOWN (119.9.23.122)
    Jul 29 00:08:06 dev-database2 rsyncd[19268]: receiving file list
    Jul 29 00:08:07 dev-database2 rsyncd[19268]: sent 63 bytes  received 12,596,622 bytes  total size 12,593,152
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: name lookup failed for 119.9.23.122: Name or service not known
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: connect from UNKNOWN (119.9.23.122)
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: rsync to rsync_sst-log_dir/ from UNKNOWN (119.9.23.122)
    Jul 29 00:08:08 dev-database2 rsyncd[19283]: receiving file list
    Jul 29 00:08:08 dev-database2 mysqld: 150729 10:08:08 [Note] WSREP: (e28b5f7d, 'ssl://0.0.0.0:4567') turning message relay requesting off

答案1

我们也遇到过这个问题,我花了一段时间才弄清楚发生了什么,因为我设法让 Galera 工作起来,但当我尝试从头开始重复这一切时,它再次失败了。

当生成的进程(例如 SST 备份运行)在启动时尝试写入应该处理其日志消息的输出描述符时失败,似乎会发生不太具描述性的未知错误 -141。

对我来说,线索是,在第二个节点上出现 -141 错误的情况下,我还注意到我没有从第一个节点上的 Galera 收到任何日志消息,即使在第二个节点上我可以清楚地看到 Galera 已经经历了完整的协商并成功加入集群,这通常会在第一个节点上记录至少 40 行信息。

我认为您可能会发现,使用自动化工具时会发生这种情况,而使用 shell 时不会发生这种情况的原因是,mysql 配置为将日志写入 syslog,但当您使用自动化工具启动或重新启动 mysql 时,记录器子进程会在启动时死亡,而不是成功启动。然后写入此描述符会失败。Mysql 本身似乎可以优雅地处理这个问题,忽略它,而 SST 脚本则不会,因此崩溃。

那么,为什么 logger 子进程在自动化工具下无法启动,而在 shell 上却可以启动呢?这似乎是由于 PTY 分配时间太短造成的 - 我认为具体问题是,如果在运行命令后立即关闭 PTY,而该命令尚未分叉或设置,这会破坏它。

对于使用 ruby​​ 脚本 (capistrano v2) 的我来说,我只需将 :pty => false 添加到我的运行命令中,这样 PTY 就不会被分配。但我不知道是否有办法使用 Ansible 来做到这一点?

(您也可以通过在 nohup 之类的环境下运行服务重启脚本来解决这个问题,尽管这会产生不良的副作用,例如生成 nohup.out 文件。我的理解是 HUP 信号本身不是问题,因为当我使用 :pty => false 选项时,我不需要做任何事情来放弃该进程。)

答案2

根据https://jira.mariadb.org/browse/MDEV-13517您可以尝试通过禁用 MariaDB 中的 syslog 并启用错误日志来解决该问题,配置如下:

[mysqld_safe]
skip_syslog
[mysqld]
log_error=/your/path/error.log

相关内容