我们从 RPM 存储库运行 MariaDB 服务器 10.6.16https://mirror.mariadb.org/yum/10.6/rhel9-amd64/在具有 48 GiB 内存(无交换空间)和 32 个 CPU 核心的 AlmaLinux 9.3 系统上。我们专门使用 InnoDB 作为数据库表。系统的文件系统通过 Ceph 在 NVME SSD 上运行。
内存使用量稳步增加,时不时触发 OOM 杀手。
公式来自https://serverfault.com/a/1020847返回内存消耗上限 7500 MiB。我知道这个公式并不准确,但 MariaDB 服务当前使用的内存约为 11 GiB,远远超过预测的最大值 7500 MiB。我们观察到,在 OOM killer 出现之前,内存使用量超过 15 GiB。
我在这里遗漏了什么?
/proc/[mariadb-pid]/limits
:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 191442 191442 processes
Max open files 32768 32768 files
Max locked memory 8388608 8388608 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 191442 191442 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
[mysqld]
MariaDB 配置部分:
basedir = /usr
bind_address = 0.0.0.0
binlog_format = mixed
character-set-server = utf8
collation-server = utf8_unicode_ci
datadir = /var/lib/mysql
expire_logs_days = 7
innodb_buffer_pool_size = 4G
innodb_data_file_path = ibdata1:10M:autoextend
innodb_file_per_table = on
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_log_buffer_size = 16M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_stats_on_metadata = off
key_buffer_size = 16M
lc_messages = en_US
lc_messages_dir = /usr/share/mysql
log-bin = /var/log/mariadb/mysql-bin
log-error = /var/log/mysqld.log
long_query_time = 5
max_allowed_packet = 16M
max_connections = 1024
myisam_sort_buffer_size = 8M
net_buffer_length = 8K
pid-file = /run/mariadb/mariadb.pid
port = 3306
query_cache_size = 4194304
query_cache_type = 1
read_buffer_size = 256K
read_rnd_buffer_size = 512K
require_secure_transport = ON
server-id = 1
skip-external-locking
skip-name-resolve
slow_query_log = ON
slow_query_log_file = /var/log/mariadb/slow-queries.log
socket = /var/lib/mysql/mysql.sock
sort_buffer_size = 2M
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
ssl_ca = /etc/ssl/mariadb/chain.pem
ssl_cert = /etc/ssl/mariadb/cert.pem
ssl_key = /etc/ssl/mariadb/privkey.pem
table_definition_cache = 8000
table_open_cache = 16000
thread_cache_size = 8
thread_stack = 256K
tmp_table_size = 8M
tmpdir = /tmp
user = mysql
SHOW VARIABLES
:https://pastebin.com/raw/NcT5jxiR
SHOW GLOBAL STATUS
:https://pastebin.com/raw/B8MFr2WX
SHOW ENGINE INNODB STATUS
:
=====================================
2024-02-07 09:51:38 0x7fcf4d171640 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 1 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 508984 srv_idle
srv_master_thread log flush and writes: 508978
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 191793820
Purge done for trx's n:o < 191793816 undo n:o < 0 state: running
History list length 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION (0x7fd05f21e880), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21dd80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21bc80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21b180), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21a680), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21d280), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f21c780), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION (0x7fd05f219b80), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
31442319 OS file reads, 3626734 OS file writes, 3262529 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 5.99 writes/s, 5.99 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 65657865607
Log flushed up to 65657865557
Pages flushed up to 65594938073
Last checkpoint at 65594938073
0 pending log flushes, 0 pending chkp writes
3185884 log i/o's done, 6.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 4328521728
Dictionary memory allocated 61678248
Buffer pool size 259584
Free buffers 1
Database pages 259583
Old database pages 95802
Modified db pages 5013
Percent of dirty pages(LRU & free pages): 1.931
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 7999092, not young 2607506757
0.00 youngs/s, 0.00 non-youngs/s
Pages read 31427077, created 59965, written 435199
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 259583, unzip_LRU len: 0
I/O sum[290]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 564360, updated 1722533, deleted 374105, read 92464595804
0.00 inserts/s, 3.00 updates/s, 0.00 deletes/s, 211216.78 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
SELECT COUNT(*) FROM information_schema.tables
:
+----------+
| COUNT(*) |
+----------+
| 14520 |
+----------+
1 row in set (0.125 sec)
SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables LIMIT 100000
:
+----------+------------------+-------------------+----------------+
| COUNT(*) | sum(data_length) | sum(index_length) | sum(data_free) |
+----------+------------------+-------------------+----------------+
| 228 | 106496 | 106496 | 0 |
+----------+------------------+-------------------+----------------+
1 row in set (0.618 sec)
答案1
Analysis of GLOBAL STATUS and VARIABLES:
观察结果:
- 版本:10.6.16-MariaDB-log
- 16 GB RAM
- 正常运行时间 = 5 天 22:22:23
- 904 查询/秒:761 问题/秒
更重要的问题:
为何SHOW CREATE TABLE
如此频繁地进行?
降低innodb_ft_result_cache_limit
可能会有帮助。
查询缓存中似乎存在大量抖动。建议将其关闭:
query_cache_type = OFF
query_cache_size = 0
尝试降低thread_pool_max_threads
(目前为 64K)
为何UNLOCK TABLES
被使用?
为什么要有这么多的表?很多表都是“相同的”吗?让我们讨论一下缺点。
猜测:有很多临时表(在 SELECT 内部);这些表可能首先尝试使用 8MB RAM,然后才放弃使用磁盘:
- 降低
tmp_table_size
至 4M 看看是否有帮助; - 使用 slowlog 查找“慢速”查询;然后让我们改进它们。使用
long_query_time = 1
;参见SlowLog
详细信息和其他观察结果:
( table_open_cache ) = 15,857
-- 要缓存的表描述符的数量 -- 通常几百个就够了。
( Table_open_cache_misses ) = 1010093 / 512543 = 2 /sec
-- 可能需要增加 table_open_cache (现在为 15857)
( Com_show_create_table ) = 2531490 / 512543 = 4.9 /sec
-- SHOW CREATE TABLE 执行频率。可能来自不严谨的第三方软件包。-- 检查 table_open_cache(现在为 15857)和 open_files_limit(现在为 32768)。或者避免使用第三方软件包。
( innodb_buffer_pool_size ) = 4,096 / 16384M = 25.0%
-- 用于 InnoDB 缓冲池的 RAM 百分比 -- 设置为可用 RAM 的 70% 左右。(太低效率较低;太高有交换风险。)
( innodb_lru_scan_depth ) = 1,536
-- innodb_lru_scan_depth 是一个命名很差的变量。更好的名称应该是 innodb_free_page_target_per_buffer_pool。它是 InnoDB 尝试在每个缓冲池实例中保留的页面数,以加快读取和页面创建操作。--“InnoDB:page_cleaner:预期循环花费 1000ms ...”可以通过降低 lru_scan_depth 来修复
( innodb_io_capacity ) = 200
-- 刷新时,使用这么多 IOP。-- 读取可能会很慢或不稳定。如果使用 SSD 驱动器,请使用 2000。
( Innodb_buffer_pool_pages_old / Innodb_buffer_pool_pages_total ) = 95,803 / 259584 = 36.9%
——buffer_pool 中“旧”的百分比。
( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10
-- 容量:最大/普通 -- 推荐 2。最大容量应大约等于 I/O 子系统可以处理的 IOP。(如果驱动器类型未知,2000/200 可能是合理的一对。)
( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 512,543 / 60 * 512M / 5039606272 = 910
-- InnoDB 日志轮换间隔分钟数 从 5.6.8 开始,innodb_log_file_size 可以动态更改;我不知道 MariaDB 的情况。一定要更改 my.cnf --(建议轮换间隔 60 分钟,这有点武断。)调整 innodb_log_file_size(现在为 536870912)。(在 AWS 中无法更改。)
( Innodb_row_lock_waits/Innodb_rows_inserted ) = 19,947/565194 = 3.5%
—— 等待某一行的频率。
( innodb_flush_neighbors ) = innodb_flush_neighbors = 1
-- 将块写入磁盘时进行小幅优化。-- 对于 SSD 驱动器使用 0;对于 HDD 使用 1。
( innodb_io_capacity ) = 200
-- 磁盘每秒可执行的 I/O 操作数。慢速驱动器为 100;旋转驱动器为 200;SSD 为 1000-2000;乘以 RAID 因子。限制每秒写入 IOPS。-- 首先:HDD:200;SSD:2000。
( innodb_flush_log_at_trx_commit ) = 1
-- 1 = 安全;2 = 更快 -- (您决定)使用 1,以及 sync_binlog(现在为 0)=1,以获得最高级别的容错能力。0 最适合速度。2 是 0 和 1 之间的折衷。
( sync_binlog ) = 0
-- 使用 1 来增加安全性,但会花费一些 I/O 成本 =1 可能会导致大量“查询结束”;=0 可能会导致“binlog 处于不可能的位置”并在崩溃时丢失事务,但速度更快。对于 Galera 来说 0 是可以的。
( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF
-- 是否记录所有死锁。-- 如果您受到死锁的困扰,请启用此功能。注意:如果您有大量死锁,这可能会将大量数据写入磁盘。
( innodb_ft_result_cache_limit ) = 2,000,000,000 / 16384M = 11.6%
-- 全文结果集的字节限制。(它会根据需要增长。)-- 降低设置。
( character_set_client ) = character_set_client = utf8mb3
-- -- 如果您要使用西欧以外地区的文本,请考虑切换到 utf8mb4。(超出了本讨论范围。)
( character_set_connection ) = character_set_connection = utf8mb3
--
( character_set_results ) = character_set_results = utf8mb3
--
( local_infile ) = local_infile = ON
-- local_infile(现在为 ON)= ON 是一个潜在的安全问题
( query_cache_size ) = 4M
-- QC 的大小 -- 太小 = 用处不大。太大 = 开销太大。建议为 0 或不超过 50M。
( Qcache_lowmem_prunes ) = 223055278 / 512543 = 435 /sec
-- QC 空间不足 -- 增加 query_cache_size (现在为 4194304)
( Qcache_lowmem_prunes/Qcache_inserts ) = 223,055,278/224646888 = 99.3%
-- 删除率(由于内存不足而需要修剪的频率)
( Qcache_hits / Qcache_inserts ) = 119,595,591 / 224646888 = 0.532
-- 命中插入率 -- 越高越好 -- 考虑关闭查询缓存。
( Qcache_inserts - Qcache_queries_in_cache ) = (224646888 - 700) / 512543 = 438 /sec
-- 失效次数/秒。
( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (4M - 582944) / 700 / 16384 = 0.315
-- query_alloc_block_size 与公式 -- 调整 query_alloc_block_size(现在为 16384)
( Created_tmp_tables ) = 14256825 / 512543 = 28 /sec
——作为复杂 SELECT 的一部分创建“临时”表的频率。
( Created_tmp_disk_tables ) = 3584311 / 512543 = 7 /sec
-- 创建频率磁盘“临时”表作为复杂 SELECT 的一部分 - 增加 tmp_table_size(现在为 8388608)和 max_heap_table_size(现在为 16777216)。检查使用 MEMORY 而不是 MyISAM 时的临时表规则。也许较小的架构或查询更改可以避免使用 MyISAM。更好的索引和查询的重新表述更有可能有所帮助。
( binlog_format ) = binlog_format = MIXED
-- 语句/行/混合。-- 5.7 (10.3) 首选行
( long_query_time ) = 5
-- 定义“慢速”查询的截止时间(秒)。-- 建议 2
( Max_used_connections / max_connections ) = 205 / 1024 = 20.0%
-- 连接的峰值百分比 -- 由于几个内存因素可以基于 max_connections(现在为 1024)扩大,因此最好不要将该设置得太高。
( Com_change_db ) = 4206525 / 512543 = 8.2 /sec
-- 可能来自 USE 语句。-- 考虑连接 DB、使用 db.tbl 语法、消除虚假的 USE 语句等。
( Connections ) = 7860380 / 512543 = 15 /sec
-- 连接 -- 增加 wait_timeout(现在为 28800);使用池?
( thread_pool_max_threads ) = 65,536
-- MariaDB 线程池的众多设置之一 -- 降低值。
异常小:
Handler_read_next / Handler_read_key = 0.58
min(max_heap_table_size, tmp_table_size) = 8MB
net_buffer_length = 8,192
tmp_memory_table_size = 8MB
异常大:
(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 90
Acl_database_grants = 171
Acl_users = 175
Binlog_snapshot_position = 1.06e+9
Com_release_savepoint = 14 /HR
Com_rollback_to_savepoint = 2.3 /sec
Com_savepoint = 14 /HR
Com_show_binlog_status = 0.014 /HR
Com_show_events = 1.1 /HR
Com_show_fields = 2.5 /sec
Com_show_grants = 0.099 /sec
Com_stmt_close = 72 /sec
Com_stmt_execute = 72 /sec
Com_stmt_prepare = 72 /sec
Com_unlock_tables = 18 /HR
Feature_check_constraint = 22 /HR
Feature_fulltext = 0.054 /sec
Feature_json = 4.6 /sec
Feature_subquery = 167 /sec
Handler_discover = 0.77 /HR
Handler_read_key = 90462 /sec
Handler_savepoint = 28 /HR
Handler_savepoint_rollback = 4.6 /sec
Handler_tmp_update = 6181 /sec
Innodb_buffer_pool_pages_lru_flushed = 317,184
Innodb_buffer_pool_pages_lru_freed = 1.72e+6
Innodb_data_pending_reads = 1
Innodb_dblwr_pages_written / Innodb_dblwr_writes = 72.5
Innodb_instant_alter_column = 0.47 /HR
Innodb_log_writes / Innodb_log_write_requests = 675.0%
Innodb_num_open_files = 15,257
Memory_used = 29.6%
Memory_used_initial = 4,280.9MB
Open_streams = 4
Open_table_definitions = 14,162
Open_tables = 15,857
Opened_views = 0.15 /sec
Qcache_hits = 233 /sec
Qcache_inserts = 438 /sec
Ssl_accepts = 536
Ssl_finished_accepts = 536
Subquery_cache_hit = 494 /sec
Subquery_cache_miss = 114 /sec
innodb_lru_scan_depth / innodb_io_capacity = 7.68
innodb_open_files = 15,857
innodb_purge_batch_size = 1,000
performance_schema_max_statement_classes = 222
table_definition_cache = 8,000
异常字符串:
Slave_heartbeat_period = 0
Slave_received_heartbeats = 0
aria_recover_options = BACKUP,QUICK
character_set_system = utf8mb3
disconnect_on_expired_password = OFF
innodb_fast_shutdown = 1
old_alter_table = DEFAULT
old_mode = UTF8_IS_UTF8MB3
optimizer_trace = enabled=off
require_secure_transport = ON
sql_slave_skip_counter = 0
答案2
2024 年 3 月 1 日 垫脚石 AG
这可能会导致您的问题。
https://stackoverflow.com/questions/78071133/mariadb-not-releasing-ram-after-jobs-finish
全局变量 SELECT @@version_malloc_library;需要是 jemalloc
祝您一切顺利。
答案3
内存不足建议考虑缓解。
Create 10G SWAP Space.
宁愿慢跑几秒钟,也不愿处理 OOM。
每秒速率 = RPS
建议考虑在 my.cnf [mysqld] 部分中减少 RAM 使用量
max_connections= 512 # from 1024 - max_connections_used = 205 in 5 days.
read_rnd_buffer_size=16K # from 512K to reduce handler_read_rnd_next RPS 108,335
innodb_io_capacity=900 # from 200 to use more of NVME IOPS capacity
thread_cache_size=128 # from 8 to reduce threads_created RPS 2+.
innodb_old_blocks_pct=1 # from 37 to support more rows in Innodb_buffer_pool_size
有更多的机会在您的配置中节省 RAM 并每天/每晚提高性能。