几天前,我设置了一个从属 MySQL 服务器(Mysql Community Server 8.0.21)。到目前为止,我们只使用 mysqldump 将其用于备份。我确实注意到它使用了太多内存,但我并不介意,因为我使用了一些配置参数来自动调节它使用的内存量,而且 VPS 专用于 MySQL 服务器(如果我支付了全部内存费用,为什么不使用它呢,对吧?)。
今天早上我做的第一件事就是查看我的 Zabbix,发现从服务器没有工作,所以我通过 ssh 进入它,重启 MySQL,然后尝试使用以下命令启动从服务器启动从服务器。这就是我得到的:
ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository
不确定是什么导致了这个问题(这是我的 my.cnf)
[client]
# Usado apenas para casos especificos
port = 3306 # Porta parada
socket = /var/run/mysqld/mysqld.sock
[mysql]
# Configurações do cliente
auto-rehash # Auto completar
[mysqld]
# Configuração do servidor
pid_file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
datadir = /var/lib/mysql
log_error = /var/log/mysql/error.log
user = mysql
bind_address = 0.0.0.0 # Ouve todos os endereços
# Configurações genericas do servidor
max_allowed_packet = 32M # Tamanho maximo do pacote.
max_connections = 2000 # Maximo de coneções
open_files_limit = 10000 # Maximo de arquivos abertos
tmp_table_size = 64M # Limite tamanho tabela em mem
max_heap_table_size = 64M # Limite tamanho tabela em mem
tmpdir = /tmp # Diretorio /tmp/
default_storage_engine = InnoDB # Engine default
skip_name_resolve # Desabilita resolução DNS
# Configurações de log binario
log_bin = mysql-bin # Arquivo de log binario
relay-log = mysql-relay-bin
log-slave-updates = 1 # Log de update no slave
read-only = 1 # Apenas leitura
binlog-format = mixed # Formato
server_id = 2 # Identifica servidor para log
max_binlog_size = 256M # Tamanho maximo log binario
binlog_expire_logs_seconds = 604800 # Max tempo log binario
innodb_flush_log_at_trx_commit = 1 # Proteção de dados
sync_binlog = 1 # Somente pra replicação
# Configurações especificas do InnoDB
innodb_dedicated_server = ON # Autoconf InnoDB
innodb_io_capacity = 2000 # Quantas escritas por segundo
innodb_read_io_threads = 64 # Threads de leitura
innodb_write_io_threads = 64 # Threads de escrita
innodb_thread_concurrency = 0 # Auto dectecta threads
# Slow query log
slow_query_log = 1 # Guarda queries lentas
long_query_time = 1.0 # Tempo query
# Funções dentro do Mysql
log_bin_trust_function_creators = 1; # Permite funções criadas
我不知道该如何修复它。我应该重新进行整个转储吗?我该如何防止这种情况再次发生?
更新:
这是显示从属状态:
mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: IP_ADDR
Master_User: USER
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000011
Read_Master_Log_Pos: 178887867
Relay_Log_File: pergamum-relay-bin.000029
Relay_Log_Pos: 178888082
Relay_Master_Log_File: mysql-bin.000011
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 13124
Last_Error: Slave failed to initialize relay log info structure from the repository
Skip_Counter: 0
Exec_Master_Log_Pos: 178887867
Relay_Log_Space: 0
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 13124
Last_SQL_Error: Slave failed to initialize relay log info structure from the repository
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID: bef45e1b-99d6-11ea-a355-3e2547e4f083
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 200819 09:58:32
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
更新
正如所问的,以下是来自两个服务器的变量。
掌握
GLOBAL STATUS:
https://gist.github.com/IamRichter/ef4993bbf65883baa366ff30d73b9644
GLOBAL VARIABLES:
https://gist.github.com/IamRichter/dbf08facd364548529b07bc0cbd6b2e6
奴隶
GLOBAL STATUS:
https://gist.github.com/IamRichter/fcdff1111ed52fc859ebac46a2dbfc27
GLOBAL VARIABLES:
https://gist.github.com/IamRichter/53c5638fb3c6064fd51eab332660f07f
答案1
唉,变量/状态没有显示您所遇到的问题的任何明显原因。 或许较高的 buffer_pool_size 会过度挤压 RAM,但我对此表示怀疑。以下是我的分析(主服务器,然后是副本服务器):
观察结果:
- 版本:8.0.20
- 16 GB RAM
- 正常运行时间 = 2 天 02:30:41
- 您没有在 Windows 上运行。
- 运行 64 位版本
- 您似乎正在运行全部(或大部分)InnoDB。
更重要的问题:
table_open_cache
如果操作系统允许,请增加。为什么有这么多表?
我认为(没有任何证据)使用除了“2”之外的任何东西都是不明智的,因为innodb_log_files_in_group
MariaDB 已经忽略了它;我还没有听说过 Oracle。
有一些迹象表明查询表述不当。请参阅http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog用于分析慢速日志。
你这样做有什么理由吗binlog_format=MIXED
?
详细信息和其他观察结果:
( Opened_tables ) = 452,365 / 181841 = 2.5 /sec
-- 打开表的频率 -- 增加 table_open_cache (现在为 3995)
( Table_open_cache_overflows ) = 448,306 / 181841 = 2.5 /sec
-- 可能需要增加 table_open_cache (现在为 3995)
( Table_open_cache_misses ) = 452,365 / 181841 = 2.5 /sec
-- 可能需要增加 table_open_cache (现在为 3995)
( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096
-- 页面清理器每秒的工作量。-- “InnoDB:page_cleaner:预期循环花费 1000 毫秒...”可通过降低 lru_scan_depth 来解决:考虑 1000/innodb_page_cleaners(现在为 4)。还请检查交换。
( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5
-- innodb_page_cleaners -- 建议将 innodb_page_cleaners(现在为 4)设置为 innodb_buffer_pool_instances(现在为 8)
( innodb_lru_scan_depth ) = 1,024
-- “InnoDB:page_cleaner:预期循环花费 1000ms...” 可以通过降低 lru_scan_depth 来修复
( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 2,102,149 / 7377368 = 28.5%
-- 必须到达磁盘的写入请求 -- 检查 innodb_buffer_pool_size(现在为 12884901888)
( innodb_log_files_in_group ) = 9
-- InnoDB 日志文件的数量 -- 2 可能是唯一合理的值。较大的数字可能会导致性能问题。
( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,773,668,352 / (181841 / 3600) / 9 / 1024M = 0.00363
-- 比率 -- (见会议纪要)
( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 181,841 / 60 * 1024M / 1773668352 = 1,834
-- InnoDB 日志轮换之间的分钟数从 5.6.8 开始,可以动态更改;请确保也更改 my.cnf。--(轮换之间建议 60 分钟有点武断。)调整 innodb_log_file_size(现在为 1073741824)。(无法在 AWS 中更改。)
( innodb_flush_method ) = innodb_flush_method = O_DIRECT_NO_FSYNC
-- InnoDB 应如何要求操作系统写入块。建议使用 O_DIRECT 或 O_ALL_DIRECT (Percona) 来避免双重缓冲。(至少对于 Unix 而言。)有关 O_ALL_DIRECT 的注意事项,请参阅 chrischandler
( Innodb_row_lock_time_avg ) = 1,886
-- 锁定一行的平均时间(毫秒)-- 可能有冲突的查询;也可能是表扫描。
( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF
-- 是否记录所有死锁。-- 如果您受到死锁的困扰,请启用此功能。注意:如果您有大量死锁,这可能会将大量数据写入磁盘。
( max_connections ) = 2,000
-- 最大连接数(线程)。影响各种分配。-- 如果 max_connections(现在为 2000)太高,并且各种内存设置都很高,则可能会耗尽 RAM。
( Handler_read_rnd_next / Com_select ) = 46,219,107,293 / 7120289 = 6,491
-- 每个 SELECT 扫描的平均行数。(大约)-- 考虑提高 read_buffer_size(现在为 131072)
( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (358564 + 419704 + 462 + 0) / 2006877 = 0.388
-- 每次提交的语句数(假设所有都是 InnoDB)-- 低:可能有助于将查询分组到事务中;高:长事务会给各种事物带来压力。
( Select_scan ) = 4,386,291 / 181841 = 24 /sec
-- 全表扫描 -- 添加索引/优化查询(除非它们是小表)
( Select_scan / Com_select ) = 4,386,291 / 7120289 = 61.6%
-- 执行全表扫描的选择百分比。(可能会被存储例程欺骗。)-- 添加索引/优化查询
( binlog_format ) = binlog_format = MIXED
-- 语句/行/混合。-- 5.7 (10.3) 首选行
( Aborted_clients ) = 175,398 / 181841 = 0.96 /sec
-- 由于 wait_timeout 导致线程堵塞 -- 增加 wait_timeout(现在为 28800);最好使用 disconnect
( Connections ) = 3,544,963 / 181841 = 19 /sec
-- 连接 -- 增加 wait_timeout(现在为 28800);使用池?
异常小: 打开文件 = 3
异常大: Com_do = 0.02 /HR Com_release_savepoint = 2.3 /HR Com_rollback_to_savepoint = 0.11 /秒 Com_savepoint = 2.3 /HR Com_show_charsets = 0.44 /HR Com_show_create_db = 2.3 /HR Com_show_create_func = 2.5 /HR Com_show_create_proc = 1.5 /HR Com_show_create_trigger = 0.12 /HR Com_show_function_status = 2.5 /HR Com_show_procedure_status = 2.5 /HR Com_show_table_status = 0.11 /秒 Com_show_triggers = 0.11 /秒 Created_tmp_files = 20 /秒 Handler_read_key = 139335 /秒 Handler_read_next = 316713 /秒 Handler_read_rnd = 46465 /秒 Handler_savepoint = 2.3 /HR Handler_savepoint_rollback = 0.11 /秒 Innodb_buffer_pool_read_requests = 1095747 /秒 Innodb_num_open_files = 3,996 Innodb_system_rows_inserted = 0.97 /HR Innodb_system_rows_read = 1.99e+7 Innodb_system_rows_updated = 0.095 /秒 Open_table_definitions = 2,000 Select_full_range_join = 2.1 /秒 Select_full_range_join / Com_select = 5.5% back_log / max_connections = 100.0% innodb_max_dirty_pages_pct_lwm = 10 innodb_read_io_threads = 64 innodb_undo_tablespaces = 2 innodb_write_io_threads = 64 max_error_count = 1,024 max_length_for_sort_data = 4,096
异常字符串: default_authentication_plugin = caching_sha2_password event_scheduler = ON innodb_dedicated_server = ON innodb_fast_shutdown = 1 optimizer_trace = enabled=off,one_line=off optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
sf1030756-M
观察结果:
- 版本:8.0.21
- 6 GB RAM
- 正常运行时间 = 5 天 23:59:45
- 您确定这是 SHOW GLOBAL STATUS 吗?——或者这个副本只是不忙??
- 您没有在 Windows 上运行。
- 运行 64 位版本
- 您似乎正在运行全部(或大部分)InnoDB。
更重要的问题:
与小学一样—— table_open_cache
,,innodb_log_files_in_group
binlog_format
检查交换。 innodb_buffer_pool_size
对于 6GB 的 RAM 来说相当大。
降低max_connections
到 100 —— 到目前为止您还没有使用超过 5 个;当前的 2000 对 RAM 来说是一个负担。
详细信息和其他观察结果:
( table_open_cache ) = 3,995
-- 要缓存的表描述符的数量 -- 通常几百个就够了。
( Table_open_cache_misses / (Table_open_cache_hits + Table_open_cache_misses) ) = 56,436 / (1107115 + 56436) = 4.9%
-- table_open_cache 的有效性。-- 增加 table_open_cache(现在为 3995)并检查 table_open_cache_instances(现在为 16)。
( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096
-- 页面清理器每秒的工作量。-- “InnoDB:page_cleaner:预期循环花费 1000 毫秒...”可通过降低 lru_scan_depth 来解决:考虑 1000/innodb_page_cleaners(现在为 4)。还请检查交换。
( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5
-- innodb_page_cleaners -- 建议将 innodb_page_cleaners(现在为 4)设置为 innodb_buffer_pool_instances(现在为 8)
( innodb_lru_scan_depth ) = 1,024
-- “InnoDB:page_cleaner:预期循环花费 1000ms...” 可以通过降低 lru_scan_depth 来修复
( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 25,132 / 57003 = 44.1%
-- 必须到达磁盘的写入请求 -- 检查 innodb_buffer_pool_size(现在为 5368709120)
( innodb_log_files_in_group ) = 5
-- InnoDB 日志文件的数量 -- 2 可能是唯一合理的值。较大的数字可能会导致性能问题。
( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 22,224,896 / (518385 / 3600) / 5 / 512M = 5.7e-5
-- 比率 -- (见会议纪要)
( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 518,385 / 60 * 512M / 22224896 = 208,704
-- InnoDB 日志轮换之间的分钟数从 5.6.8 开始,可以动态更改;请确保也更改 my.cnf。--(轮换之间建议 60 分钟有点武断。)调整 innodb_log_file_size(现在为 536870912)。(无法在 AWS 中更改。)
( innodb_flush_method ) = innodb_flush_method = O_DIRECT_NO_FSYNC
-- InnoDB 应如何要求操作系统写入块。建议使用 O_DIRECT 或 O_ALL_DIRECT (Percona) 来避免双重缓冲。(至少对于 Unix 而言。)有关 O_ALL_DIRECT 的注意事项,请参阅 chrischandler
( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF
-- 是否记录所有死锁。-- 如果您受到死锁的困扰,请启用此功能。注意:如果您有大量死锁,这可能会将大量数据写入磁盘。
( max_connections ) = 2,000
-- 最大连接数(线程)。影响各种分配。-- 如果 max_connections(现在为 2000)太高,并且各种内存设置都很高,则可能会耗尽 RAM。
( (Com_show_create_table + Com_show_fields) / Questions ) = (8605 + 17405) / 313218 = 8.3%
-- 不合理的框架 -- 花费大量精力重新发现模式。-- 向第三方供应商投诉。
( tmp_table_size ) = 64M
-- 大小限制记忆用于支持 SELECT 的临时表 - 减少 tmp_table_size(现在为 67108864)以避免 RAM 耗尽。可能不超过 64M。
( Select_full_join / Com_select ) = 8,884 / 123363 = 7.2%
-- 无索引连接的选择百分比 -- 为 JOIN 中使用的表添加合适的索引。
( Select_scan / Com_select ) = 107,352 / 123363 = 87.0%
-- 执行全表扫描的选择百分比。(可能会被存储例程欺骗。)-- 添加索引/优化查询
( Com_admin_commands / Queries ) = 6,755 / 320422 = 2.1%
-- “管理员”命令的查询百分比。-- 发生了什么?
( binlog_format ) = binlog_format = MIXED
-- 语句/行/混合。-- 5.7 (10.3) 首选行
( log_slow_slave_statements ) = log_slow_slave_statements = OFF
-- (5.6.11, 5.7.1) 默认情况下,复制的语句不会显示在慢速日志中;这会导致它们显示出来。-- 在慢速日志中查看可能干扰副本读取的写入会很有帮助。
( thread_cache_size / Max_used_connections ) = 28 / 5 = 560.0%
-- 线程缓存大于可能的连接数没有任何好处。浪费空间才是缺点。
( thread_stack * max_connections ) = (286720 * 2000) / 6144M = 8.9%
-- max_connections 的最低内存分配。-- 降低 max_connections (现在为 2000)
异常小: Bytes_received = 58 /秒 Com_insert = 0 Com_update = 0 Created_tmp_files = 0.035 /HR Handler_update = 2.2 /HR Innodb_buffer_pool_write_requests = 0.11 /秒 Innodb_buffer_pool_write_requests / Innodb_buffer_pool_pages_flushed = 2.27 Innodb_dblwr_pages_written / Innodb_dblwr_writes = 1 Innodb_pages_created = 1 /HR Innodb_rows_deleted + Innodb_rows_inserted = 0 Innodb_rows_inserted = 0 Innodb_rows_updated = 0 Select_range = 0 Select_range / Com_select = 0
异常大: Com_do = 0.014 /HR Com_show_create_func = 0.15 /HR Com_slave_start = 0.028 /HR Innodb_num_open_files = 3,996 Innodb_system_rows_read = 1.91e+7 Innodb_system_rows_updated = 2.2 /HR Open_table_definitions = 2,000 back_log / max_connections = 100.0% innodb_max_dirty_pages_pct_lwm = 10 innodb_read_io_threads = 64 innodb_undo_tablespaces = 2 innodb_write_io_threads = 64 max_error_count = 1,024 max_length_for_sort_data = 4,096 optimizer_trace_offset = --1 performance_schema_error_size = 4,772 performance_schema_max_cond_classes = 100 performance_schema_max_mutex_classes = 300 performance_schema_max_rwlock_classes = 60 performance_schema_max_stage_classes = 175 performance_schema_max_statement_classes = 218 performance_schema_max_thread_classes = 100
异常字符串: default_authentication_plugin = caching_sha2_password event_scheduler = ON innodb_dedicated_server = ON innodb_fast_shutdown = 1 optimizer_trace = enabled=off,one_line=off optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on read_only = ON slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN
sf1030756-S