如何向具有一个接受读写操作的节点的 galera 集群添加 2 个缺失的节点?

如何向具有一个接受读写操作的节点的 galera 集群添加 2 个缺失的节点?

我正在运行一个有 3 个节点的 galera 集群,有 3TB 的数据。昨晚由于短暂断电和 UPS 故障,我完全丢失了 2 个节点和 /var/lib/mysql 文件夹中的所有内容。目前,我正在单个节点上运行所有应用程序,并且我已经尝试了 12 个小时将其他 2 个节点添加到集群中。我每晚都会运行一次 Percona XtraBackup,所以我有备份。

目前情况:

  • Node1:已启动并正在运行(我尝试重新启动 mysql 服务)
  • Node2:关闭(/var/lib/mysql 中没有数据)
  • Node3:关闭(/var/lib/mysql 中没有数据)

我该如何修复它?

我尝试过的:

  1. 我恢复了上次备份Node1 to both Node2 and Node3

  2. 我检查 grastate 文件并确保 UUID 和序列号与恢复位置和序列号输出相匹配mysqld --wsrep-recover.

  3. 然后我运行它service mysql start,它删除了我在步骤 1 中恢复的整个备份

[PROD root@galera0 data]# cat /mysql/data//galera0.err
230219 14:18:47 mysqld_safe Starting mysqld daemon with databases from /mysql/data
230219 14:18:47 mysqld_safe WSREP: Running position recovery with --log_error='/mysql/data/wsrep_recovery.gLrvbu' --pid-file='/mysql/data/galera0-recover.pid'
2023-02-19 14:18:47 139663120775296 [Warning] option 'wsrep_max_ws_size': unsigned value 2147483648 adjusted to 2147483647
2023-02-19 14:18:47 139663120775296 [Note] /usr/sbin/mysqld (mysqld 10.1.25-MariaDB) starting as process 96422 ...
230219 14:19:01 mysqld_safe WSREP: Recovered position 3fd1b891-5449-11e6-86af-caef4b06028d:29739656612
2023-02-19 14:19:01 140052404881536 [Warning] option 'wsrep_max_ws_size': unsigned value 2147483648 adjusted to 2147483647
2023-02-19 14:19:01 140052404881536 [Note] /usr/sbin/mysqld (mysqld 10.1.25-MariaDB) starting as process 96565 ...
2023-02-19 14:19:01 140052404881536 [Note] WSREP: Read nil XID from storage engines, skipping position init
2023-02-19 14:19:01 140052404881536 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2023-02-19 14:19:01 140052404881536 [Note] WSREP: wsrep_load(): Galera 25.3.20(r3703) by Codership Oy <[email protected]> loaded successfully.
2023-02-19 14:19:01 140052404881536 [Note] WSREP: CRC-32C: using hardware acceleration.
2023-02-19 14:19:01 140052404881536 [Note] WSREP: Found saved state: 3fd1b891-5449-11e6-86af-caef4b06028d:29739656612, safe_to_bootsrap: 1
2023-02-19 14:19:01 140052404881536 [Note] WSREP: Passing config to GCS: base_dir = /mysql/data/; base_host = 10.20.147.213; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mysql/data//galera.cache; gcache.page_size = 32G; gcache.recover = no; gcache.size = 32G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 0.99; gcs.fc_limit = 256; gcs.fc_master_slave = YES; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_q
2023-02-19 14:19:02 140052404881536 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(3fd1b891-5449-11e6-86af-caef4b06028d:29739656612)
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Assign initial position for certification: 29739656612, protocol version: -1
2023-02-19 14:19:02 140052404881536 [Note] WSREP: wsrep_sst_grab()
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Start replication
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Setting initial position to 3fd1b891-5449-11e6-86af-caef4b06028d:29739656612
2023-02-19 14:19:02 140052404881536 [Note] WSREP: protonet asio version 0
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Using CRC-32C for message checksums.
2023-02-19 14:19:02 140052404881536 [Note] WSREP: backend: asio
2023-02-19 14:19:02 140052404881536 [Note] WSREP: gcomm thread scheduling priority set to other:0
2023-02-19 14:19:02 140052404881536 [Warning] WSREP: access file(/mysql/data//gvwstate.dat) failed(No such file or directory)
2023-02-19 14:19:02 140052404881536 [Note] WSREP: restore pc from disk failed
2023-02-19 14:19:02 140052404881536 [Note] WSREP: GMCast version 0
2023-02-19 14:19:02 140052404881536 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2023-02-19 14:19:02 140052404881536 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2023-02-19 14:19:02 140052404881536 [Note] WSREP: EVS version 0
2023-02-19 14:19:02 140052404881536 [Note] WSREP: gcomm: connecting to group 'APP-galera-PROD-PROD-', peer '10.20.135.29:,10.20.138.36:,10.20.147.213:'
2023-02-19 14:19:02 140052404881536 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') connection established to 5bb34d0c tcp://10.20.147.213:4567
2023-02-19 14:19:02 140052404881536 [Warning] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') address 'tcp://10.20.147.213:4567' points to own listening address, blacklisting
2023-02-19 14:19:02 140052404881536 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') connection established to d5bcab2b tcp://10.20.135.29:4567
2023-02-19 14:19:02 140052404881536 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2023-02-19 14:19:02 140052404881536 [Note] WSREP: declaring d5bcab2b at tcp://10.20.135.29:4567 stable
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Node d5bcab2b state prim
2023-02-19 14:19:02 140052404881536 [Note] WSREP: view(view_id(PRIM,5bb34d0c,2) memb {
        5bb34d0c,0
        d5bcab2b,0
} joined {
} left {
} partitioned {
})
2023-02-19 14:19:02 140052404881536 [Note] WSREP: save pc into disk
2023-02-19 14:19:02 140052404881536 [Note] WSREP: discarding pending addr without UUID: tcp://10.20.138.36:4567
2023-02-19 14:19:02 140052404881536 [Note] WSREP: gcomm: connected
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Opened channel 'APP-galera-PROD-PROD-'
2023-02-19 14:19:02 140017821284096 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2023-02-19 14:19:02 140052404881536 [Note] WSREP: Waiting for SST to complete.
2023-02-19 14:19:02 140017821284096 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5c005823-b060-11ed-a9c8-1e8851bad471
2023-02-19 14:19:02 140017821284096 [Note] WSREP: STATE EXCHANGE: sent state msg: 5c005823-b060-11ed-a9c8-1e8851bad471
2023-02-19 14:19:02 140017821284096 [Note] WSREP: STATE EXCHANGE: got state msg: 5c005823-b060-11ed-a9c8-1e8851bad471 from 0 (galera0)
2023-02-19 14:19:02 140017821284096 [Note] WSREP: STATE EXCHANGE: got state msg: 5c005823-b060-11ed-a9c8-1e8851bad471 from 1 (galera1)
2023-02-19 14:19:02 140017821284096 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = 1,
        members    = 1/2 (joined/total),
        act_id     = 2687225,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = da0e45a6-b00e-11ed-815f-c3684f57969d
2023-02-19 14:19:02 140017821284096 [Note] WSREP: Flow-control interval: [253, 256]
2023-02-19 14:19:02 140017821284096 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2687225)
2023-02-19 14:19:02 140052404562688 [Note] WSREP: State transfer required:
        Group state: da0e45a6-b00e-11ed-815f-c3684f57969d:2687225
        Local state: 3fd1b891-5449-11e6-86af-caef4b06028d:29739656612
2023-02-19 14:19:02 140052404562688 [Note] WSREP: New cluster view: global state: da0e45a6-b00e-11ed-815f-c3684f57969d:2687225, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 3
2023-02-19 14:19:02 140052404562688 [Warning] WSREP: Gap in state sequence. Need state transfer.
2023-02-19 14:19:02 140017791923968 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.20.147.213' --datadir '/mysql/data/'   --parent '96565' --binlog '/mysql/logs/galera0-bin' '
WSREP_SST: [INFO] Streaming with xbstream (20230219 14:19:02.666)
WSREP_SST: [INFO] Using socat as streamer (20230219 14:19:02.667)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20230219 14:19:04.465)
2023-02-19 14:19:04 140052404562688 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.20.147.213:4444/xtrabackup_sst//1
2023-02-19 14:19:04 140052404562688 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2023-02-19 14:19:04 140052404562688 [Note] WSREP: REPL Protocols: 7 (3, 2)
2023-02-19 14:19:04 140052404562688 [Warning] WSREP: moving position backwards: 29739656612 -> 2687225
2023-02-19 14:19:04 140052404562688 [Note] WSREP: Assign initial position for certification: 2687225, protocol version: 3
2023-02-19 14:19:04 140017887454976 [Note] WSREP: Service thread queue flushed.
2023-02-19 14:19:04 140052404562688 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (3fd1b891-5449-11e6-86af-caef4b06028d) does not match group state UUID (da0e45a6-b00e-11ed-815f-c3684f57969d): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2023-02-19 14:19:04 140017821284096 [Note] WSREP: Member 0.0 (galera0) requested state transfer from '*any*'. Selected 1.0 (galera1)(SYNCED) as donor.
2023-02-19 14:19:04 140017821284096 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2688889)
2023-02-19 14:19:04 140052404562688 [Note] WSREP: Requesting state transfer: success, donor: 1
2023-02-19 14:19:04 140052404562688 [Note] WSREP: GCache history reset: old(3fd1b891-5449-11e6-86af-caef4b06028d:0) -> new(da0e45a6-b00e-11ed-815f-c3684f57969d:2687225)
2023-02-19 14:19:05 140017838061312 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') connection to peer 5bb34d0c with addr tcp://10.20.147.213:4567 timed out, no messages seen in PT3S
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /mysql/data//.sst from previous state transfer. Removing (20230219 14:19:05.230)
2023-02-19 14:19:05 140017838061312 [Note] WSREP: (5bb34d0c, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Proceeding with SST (20230219 14:19:07.474)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20230219 14:19:07.474)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20230219 14:19:07.475)
removed ‘/mysql/innodb/ib_logfile1’
removed ‘/mysql/innodb/ib_logfile0’
removed ‘/mysql/innodb/ib_logfile2’
removed ‘/mysql/innodb/ib_logfile3’
removed ‘/mysql/data/db_001a/task.ibd’
removed ‘/mysql/data/db_001a/delivery.frm’
.
.
.
.
removed directory: ‘/mysql/data/db_001a’
removed ‘/mysql/data/ibtmp1’
removed ‘/mysql/data/xtrabackup_checkpoints’
WSREP_SST: [INFO] Cleaning the binlog directory /mysql/logs as well (20230219 14:19:42.126)
removed ‘/mysql/logs/galera0-bin.000003’
removed ‘/mysql/logs/galera0-bin.000001’
removed ‘/mysql/logs/galera0-bin.000002’
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20230219 14:19:42.130)

然后它崩溃了(我相信它可能由于 systemd 超时而崩溃,因为我在 mysql 错误日志中没有任何内容。)

我在日志中看到两个问题:

我相信正是因为这个原因,才会触发 SST,而不是 IST。我想要一个 IST,因为我无法承受让 DB 服务器处于 Doner/Desynced 模式长达 6 小时,直到它完成同步。

2023-02-19 14:19:02 140052404881536 [Warning] WSREP: access file(/mysql/data//gvwstate.dat) failed(No such file or directory)

和这个:

2023-02-19 14:19:04 140052404562688 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (3fd1b891-5449-11e6-86af-caef4b06028d) does not match group state UUID (da0e45a6-b00e-11ed-815f-c3684f57969d): 1 (Operation not permitted)
             at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.

将备份恢复到一个节点大约需要 8-10 个小时,所以我无法在短时间内测试太多东西......

节点1上的Mysql:

MariaDB [(none)]> show status like 'wsrep%';
+------------------------------+-------------------------------------------+
| Variable_name                | Value                                     |
+------------------------------+-------------------------------------------+
| wsrep_apply_oooe             | 0.160012                                  |
| wsrep_apply_oool             | 0.000029                                  |
| wsrep_apply_window           | 1.268022                                  |
| wsrep_causal_reads           | 0                                         |
| wsrep_cert_deps_distance     | 40.682058                                 |
| wsrep_cert_index_size        | 189                                       |
| wsrep_cert_interval          | 0.324202                                  |
| wsrep_cluster_conf_id        | 1                                         |
| wsrep_cluster_size           | 1                                         |
| wsrep_cluster_state_uuid     | da0e45a6-b00e-11ed-815f-c3684f57969d      |
| wsrep_cluster_status         | Primary                                   |
| wsrep_commit_oooe            | 0.000000                                  |
| wsrep_commit_oool            | 0.000000                                  |
| wsrep_commit_window          | 1.107723                                  |
| wsrep_connected              | ON                                        |
| wsrep_desync_count           | 0                                         |
| wsrep_evs_delayed            |                                           |
| wsrep_evs_evict_list         |                                           |
| wsrep_evs_repl_latency       | 0/5.88166e-06/6.5346e-05/3.91738e-06/3168 |
| wsrep_evs_state              | OPERATIONAL                               |
| wsrep_flow_control_paused    | 0.000000                                  |
| wsrep_flow_control_paused_ns | 0                                         |
| wsrep_flow_control_recv      | 0                                         |
| wsrep_flow_control_sent      | 0                                         |
| wsrep_gcomm_uuid             | 2bd32e42-b064-11ed-9678-56d6a9406fa5      |
| wsrep_incoming_addresses     | 10.20.135.29:3306                         |
| wsrep_last_committed         | 2931369                                   |
| wsrep_local_bf_aborts        | 0                                         |
| wsrep_local_cached_downto    | 2861719                                   |
| wsrep_local_cert_failures    | 0                                         |
| wsrep_local_commits          | 69651                                     |
| wsrep_local_index            | 0                                         |
| wsrep_local_recv_queue       | 0                                         |
| wsrep_local_recv_queue_avg   | 0.000000                                  |
| wsrep_local_recv_queue_max   | 1                                         |
| wsrep_local_recv_queue_min   | 0                                         |
| wsrep_local_replays          | 0                                         |
| wsrep_local_send_queue       | 0                                         |
| wsrep_local_send_queue_avg   | 0.045214                                  |
| wsrep_local_send_queue_max   | 15                                        |
| wsrep_local_send_queue_min   | 0                                         |
| wsrep_local_state            | 4                                         |
| wsrep_local_state_comment    | Synced                                    |
| wsrep_local_state_uuid       | da0e45a6-b00e-11ed-815f-c3684f57969d      |
| wsrep_protocol_version       | 7                                         |
| wsrep_provider_name          | Galera                                    |
| wsrep_provider_vendor        | Codership Oy <[email protected]>         |
| wsrep_provider_version       | 25.3.20(r3703)                            |
| wsrep_ready                  | ON                                        |
| wsrep_received               | 548                                       |
| wsrep_received_bytes         | 4511                                      |
| wsrep_repl_data_bytes        | 874735508                                 |
| wsrep_repl_keys              | 343650                                    |
| wsrep_repl_keys_bytes        | 4351859                                   |
| wsrep_repl_other_bytes       | 0                                         |
| wsrep_replicated             | 69651                                     |
| wsrep_replicated_bytes       | 883545031                                 |
| wsrep_thread_count           | 17                                        |
+------------------------------+-------------------------------------------+

答案1

您必须重新启动服务器。

关闭工作节点,转到grstate.dat并将 更改safe_to_bootstrap: 01。然后运行new_galera_cluster​​。

一旦该节点恢复运行,就可以安全地重新启动其他节点。

相关内容