我在生产环境中有一个 Galera 集群,它托管 3 个节点。节点是 Debian 9.5。我做了一些愚蠢的事情,编辑了 my.cnf 并损坏了一些东西。经过长时间的努力,我终于恢复了。我的 2 个节点成功恢复,它们目前正在运行,但其中一个无法恢复/启动。WSREP 有问题。
我尝试了 wsrep_sst_xtrabackup-v2、rsync 和 mariabackup。我的服务器实际上不支持 mysqldump。我几乎没有选择。运行 xtrabackup-v2 或 mariabackup 会得到以下输出;
# wsrep_sst_xtrabackup-v2 /usr/sbin/wsrep_sst_xtrabackup-v2: line 49: WSREP_SST_OPT_ROLE: unbound variable
# wsrep_sst_mariabackup /usr/bin/wsrep_sst_mariabackup: line 48: WSREP_SST_OPT_ROLE: unbound variable
我的日志的一部分;
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20190323 11:54:23.620)
2019-03-23 11:54:23 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --rent '29379' '' '': 2 (No such file or directory)
2019-03-23 11:54:23 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2019-03-23 11:54:23 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2019-03-23 11:54:23 0 [ERROR] Aborting
2019-03-23 12:08:36 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-03-23 12:08:36 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2019-03-23 12:08:36 0 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Codership Oy <[email protected]> loaded successfully.
2019-03-23 12:08:36 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-03-23 12:08:36 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2019-03-23 12:08:36 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 195.201.243.14; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 10240M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = P
2019-03-23 12:08:36 0 [Note] WSREP: GCache history reset: a4a75d49-cf56-11e8-8853-aee2dfa7f003:0 -> 00000000-0000-0000-0000-000000000000:-1
2019-03-23 12:08:36 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2019-03-23 12:08:36 0 [Note] WSREP: wsrep_sst_grab()
2019-03-23 12:08:36 0 [Note] WSREP: Start replication
2019-03-23 12:08:36 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2019-03-23 12:08:36 0 [Note] WSREP: protonet asio version 0
2019-03-23 12:08:36 0 [Note] WSREP: Using CRC-32C for message checksums.
2019-03-23 12:08:36 0 [Note] WSREP: backend: asio
2019-03-23 12:08:36 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2019-03-23 12:08:36 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2019-03-23 12:08:36 0 [Note] WSREP: restore pc from disk failed
2019-03-23 12:08:36 0 [Note] WSREP: GMCast version 0
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-03-23 12:08:36 0 [Note] WSREP: EVS version 0
2019-03-23 12:08:36 0 [Note] WSREP: gcomm: connecting to group 'G1_API', peer '195.201.108.15:,94.130.142.245:,195.201.243.14:'
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') connection established to 015eaf3c tcp://195.201.243.14:4567
2019-03-23 12:08:36 0 [Warning] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') address 'tcp://195.201.243.14:4567' points to own listening address, blacklisting
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') connection established to 7230f53b tcp://195.201.108.15:4567
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') connection established to 80ae8835 tcp://94.130.142.245:4567
2019-03-23 12:08:36 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2019-03-23 12:08:36 0 [Note] WSREP: declaring 7230f53b at tcp://195.201.108.15:4567 stable
2019-03-23 12:08:36 0 [Note] WSREP: declaring 80ae8835 at tcp://94.130.142.245:4567 stable
2019-03-23 12:08:36 0 [Note] WSREP: Node 7230f53b state prim
2019-03-23 12:08:36 0 [Note] WSREP: view(view_id(PRIM,015eaf3c,528) memb {
015eaf3c,0
7230f53b,0
80ae8835,0
} joined {
} left {
} partitioned {
})
2019-03-23 12:08:36 0 [Note] WSREP: save pc into disk
2019-03-23 12:08:37 0 [Note] WSREP: gcomm: connected
2019-03-23 12:08:37 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2019-03-23 12:08:37 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2019-03-23 12:08:37 0 [Note] WSREP: Opened channel 'G1_API'
2019-03-23 12:08:37 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2019-03-23 12:08:37 0 [Note] WSREP: Waiting for SST to complete.
2019-03-23 12:08:37 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 01ab2a41-4d5c-11e9-a33b-5f90092d1452
2019-03-23 12:08:37 0 [Warning] WSREP: Action message in non-primary configuration from member 2
2019-03-23 12:08:37 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 01ab2a41-4d5c-11e9-a33b-5f90092d1452
2019-03-23 12:08:37 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01ab2a41-4d5c-11e9-a33b-5f90092d1452 from 0 (195.201.243.14)
2019-03-23 12:08:37 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01ab2a41-4d5c-11e9-a33b-5f90092d1452 from 1 (195.201.108.15)
2019-03-23 12:08:37 0 [Note] WSREP: STATE EXCHANGE: got state msg: 01ab2a41-4d5c-11e9-a33b-5f90092d1452 from 2 (94.130.142.245)
2019-03-23 12:08:37 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 511,
members = 2/3 (joined/total),
act_id = 99796994,
last_appl. = -1,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = a4a75d49-cf56-11e8-8853-aee2dfa7f003
2019-03-23 12:08:37 0 [Note] WSREP: Flow-control interval: [28, 28]
2019-03-23 12:08:37 0 [Note] WSREP: Trying to continue unpaused monitor
2019-03-23 12:08:37 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 99796994)
2019-03-23 12:08:37 1 [Note] WSREP: State transfer required:
Group state: a4a75d49-cf56-11e8-8853-aee2dfa7f003:99796994
Local state: 00000000-0000-0000-0000-000000000000:-1
2019-03-23 12:08:37 1 [Note] WSREP: New cluster view: global state: a4a75d49-cf56-11e8-8853-aee2dfa7f003:99796994, view# 512: Primary, number of nodes: 3, my index: 0, protocol version 3
2019-03-23 12:08:37 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2019-03-23 12:08:37 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '195.201.243.14' --datadir '/var/lib/mysql/' --parent '22384' '' '''
WSREP_SST: [INFO] Streaming with xbstream (20190323 12:08:37.320)
WSREP_SST: [INFO] Using socat as streamer (20190323 12:08:37.322)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20190323 12:08:37.326)
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20190323 12:08:37.348)
encryption: using gcrypt 1.7.6-beta
2019-03-23 12:08:37 1 [Note] WSREP: Prepared SST request: xtrabackup-v2|195.201.243.14:4444/xtrabackup_sst//1
2019-03-23 12:08:37 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-03-23 12:08:37 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2019-03-23 12:08:37 1 [Note] WSREP: Assign initial position for certification: 99796994, protocol version: 4
2019-03-23 12:08:37 0 [Note] WSREP: Service thread queue flushed.
2019-03-23 12:08:37 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (a4a75d49-cf56-11e8-8853-aee2dfa7f003): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2019-03-23 12:08:37 0 [Note] WSREP: Member 0.0 (195.201.243.14) requested state transfer from '*any*'. Selected 1.0 (195.201.108.15)(SYNCED) as donor.
2019-03-23 12:08:37 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 99796997)
2019-03-23 12:08:37 1 [Note] WSREP: Requesting state transfer: success, donor: 1
2019-03-23 12:08:37 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> a4a75d49-cf56-11e8-8853-aee2dfa7f003:99796994
2019-03-23 12:08:40 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') connection to peer 015eaf3c with addr tcp://195.201.243.14:4567 timed out, no messages seen in PT3S
2019-03-23 12:08:40 0 [Note] WSREP: (015eaf3c, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20190323 12:08:40.417)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20190323 12:08:40.422)
WSREP_SST: [INFO] Proceeding with SST (20190323 12:08:40.423)
encryption: using gcrypt 1.7.6-beta
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20190323 12:08:40.428)
removed '/var/lib/mysql/ib_logfile0'
removed '/var/lib/mysql/ib_logfile1'
removed '/var/lib/mysql/ibdata1'
removed '/var/lib/mysql/aria_log_control'
removed '/var/lib/mysql/aria_log.00000001'
removed '/var/lib/mysql/mysql.sock'
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20190323 12:08:40.441)
WSREP_SST: [ERROR] xtrabackup_checkpoints missing, failed innobackupex/SST on donor (20190323 12:08:50.665)
WSREP_SST: [ERROR] Cleanup after exit with status:2 (20190323 12:08:50.667)
2019-03-23 12:08:50 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '195.201.243.14' --datadir '/var/lib/mysql/' --parent '22384' '' '': 2 (No such file or directory)
2019-03-23 12:08:50 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2019-03-23 12:08:50 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2019-03-23 12:08:50 0 [ERROR] Aborting
2019-03-23 12:08:50 0 [Warning] WSREP: 1.0 (195.201.108.15): State transfer to 0.0 (195.201.243.14) failed: -22 (Invalid argument)
2019-03-23 12:08:50 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
我的.cnf文件;
#
# my.cnf template for clustercontroller
# Copyright (C) 2011-2015 severalnines.com
#
[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid_file=/var/lib/mysql/mysql.pid
port=3306
log_error=/var/log/mysql/mysqld.log
log_warnings=2
# log_output = FILE
#Slow logging
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=2
slow_query_log=OFF
log_queries_not_using_indexes=OFF
### INNODB OPTIONS
innodb_buffer_pool_size=32215M
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path = ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_doublewrite=1
innodb_log_file_size=1024M
innodb_log_buffer_size=96M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=64
# innodb_file_format = barracuda
innodb_flush_method = O_DIRECT
# innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
default_storage_engine=innodb
# CHARACTER SET
# collation_server = utf8_unicode_ci
# init_connect = 'SET NAMES utf8'
# character_set_server = utf8
# REPLICATION SPECIFIC
server_id=4
binlog_format=ROW
# log_bin = binlog
# log_slave_updates = 1
# gtid_mode = ON
# enforce_gtid_consistency = 1
# relay_log = relay-bin
# expire_logs_days = 7
# OTHER THINGS, BUFFERS ETC
# key_buffer_size = 24M
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 512M
# sort_buffer_size = 256K
# read_buffer_size = 256K
# read_rnd_buffer_size = 512K
# myisam_sort_buffer_size = 8M
memlock=0
sysdate_is_now=1
max_connections=500
thread_cache_size=512
query_cache_type = 0
query_cache_size = 0
table_open_cache=1024
lower_case_table_names=0
# 5.6 backwards compatibility (FIXME)
# explicit_defaults_for_timestamp = 1
##
## WSREP options
##
performance_schema = ON
performance-schema-max-mutex-classes = 0
performance-schema-max-mutex-instances = 0
# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_on=ON
wsrep_node_address=195.201.243.14
# Provider specific configuration options
wsrep_provider_options="base_port=4567; gcache.size=10240M; gmcast.segment=0 "
# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="G1_API"
# Group communication system handle
wsrep_cluster_address=gcomm://195.201.108.15,94.130.142.245,195.201.243.14
# Human_readable node name (non-unique). Hostname by default.
wsrep_node_name=195.201.243.14
# Address for incoming client connections. Autodetect by default.
#wsrep_node_incoming_address=
# How many threads will process writesets from other nodes
wsrep_slave_threads=4
# DBUG options for wsrep provider
#wsrep_dbug_option
# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1
# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_data_home.
#wsrep_data_home_dir=
# Maximum number of rows in write set
wsrep_max_ws_rows=131072
# Maximum size of write set
wsrep_max_ws_size=1073741824
# to enable debug level logging, set this to 1
wsrep_debug=0
# convert locking sessions into transactions
wsrep_convert_LOCK_to_trx=0
# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1
# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1
# replicate myisam
wsrep_replicate_myisam=1
# retry autoinc insert, which failed for duplicate key error
wsrep_drupal_282555_workaround=0
# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0
# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status - new status of this node
# --uuid - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index - index of this node in the list
#wsrep_notify_cmd=
##
## WSREP State Transfer options
##
# State Snapshot Transfer method mariabackup
# ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
wsrep_sst_method=xtrabackup-v2
# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=
# SST authentication string. This will be used to send SST to joining nodes.
# Depends on SST method. For mysqldump method it is root:<root password>
# IMPORTANT: The user/password in wsrep_sst_auth must match
# user/password in [xtrabackup]
wsrep_sst_auth=test:test
# Desired SST donor name.
#wsrep_sst_donor=
# Protocol version to use
# wsrep_protocol_version=
# log conflicts
wsrep_log_conflicts=1
[MYSQL]
socket=/var/lib/mysql/mysql.sock
# default_character_set = utf8
[client]
socket=/var/lib/mysql/mysql.sock
# default_character_set = utf8
[mysqldump]
max_allowed_packet = 512M
# default_character_set = utf8
# IMPORTANT: The user/password in wsrep_sst_auth must match
# user/password in [xtrabackup]
[xtrabackup]
databases-exclude=lost+found
ssl_mode=DISABLED
[MYSQLD_SAFE]
# log_error = /var/log/mysqld.log
basedir=/usr/
# datadir = /var/lib/mysql
!include /etc/mysql/secrets-backup.cnf