我们在 GKE 上运行了一台拥有 16GB RAM 的 MariaDB 10.5.8 服务器。该服务器每天都会多次出现意外的内存使用高峰,导致服务器崩溃
1天内存使用情况图表 (橙色线是 k8s 请求的 RAM)
一些额外的细节
- 服务器有 13.4 GB 可用 RAM(不包括 mysql)
- 即使在平静的日子也会发生(比如今天)
- QPS:~150(5%更新,3%插入)
- 平均连接数 50-150
- 无异常网络流量
slow_query_log
没有显示任何有用的东西
我这里漏掉了什么?服务器内存不足是怎么回事?
下一步将启用general_log
并尝试查看是否能够捕捉到崩溃前发生的情况。
配置
[mysqld]
skip-name-resolve
explicit_defaults_for_timestamp
character-set-server=UTF8
collation-server=utf8_general_ci
sql_mode=TRADITIONAL
innodb_buffer_pool_size=4G
tmp_table_size=32M
max_heap_table_size=32M
net_read_timeout=1800
net_write_timeout=1800
max_connections=300
open_files_limit=8192
预计最大内存使用量
SELECT @@innodb_buffer_pool_size/1024/1024 as cur_buf, ROUND(
( @@GLOBAL.key_buffer_size
+ @@GLOBAL.query_cache_size
+ @@GLOBAL.tmp_table_size
+ @@GLOBAL.innodb_buffer_pool_size
+ @@GLOBAL.innodb_log_buffer_size
+ @@GLOBAL.max_connections * (
@@GLOBAL.sort_buffer_size
+ @@GLOBAL.read_buffer_size
+ @@GLOBAL.read_rnd_buffer_size
+ @@GLOBAL.join_buffer_size
+ @@GLOBAL.thread_stack
+ @@GLOBAL.binlog_cache_size)
) / 1024 / 1024, 1) `total MB`;
#cur_buf: 4096.00000000
# total MB: 5155.4
当前索引总大小
SELECT sum( ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2)) size_in_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size' AND index_name != 'PRIMARY' ORDER BY `size_in_mb` DESC
# size_in_mb 6471.11
编辑
更新状态 2021-02-08
... 哦,哎呀!有些问题以前没有出现过!...
don't see a command prompt, try pressing enter.
[--] Status: +ARCHIVE +Aria +BLACKHOLE +CSV +InnoDB +MEMORY +MRG_MyISAM +MyISAM +PERFORMANCE_SCHEMA +SEQUENCE
[--] Data in InnoDB tables: 20.6G (Tables: 1680)
[OK] Total fragmented tables: 0
-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.
-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[--] There are 620 basic passwords in the list.
-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 10h 14m 26s (4M q [128.720 qps], 295K conn, TX: 97G, RX: 1G)
[--] Reads / Writes: 89% / 11%
[--] Binary logging is disabled
[--] Physical Memory : 13.7G
[--] Max MySQL memory : 8.8G
[--] Other process memory: 0B
[--] Total buffers: 3.3G global + 18.9M per thread (300 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 5.6G (41.01% of installed RAM)
[OK] Maximum possible memory usage: 8.8G (64.64% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (19/4M)
[OK] Highest usage of available connections: 41% (125/300)
[OK] Aborted connections: 0.00% (3/295567)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (6 temp sorts / 359K sorts)
[!!] Joins performed without indexes: 1244
[!!] Temporary tables created on disk: 54% (76K on disk / 140K total)
[OK] Thread cache hit rate: 99% (125 created / 295K connections)
[OK] Table cache hit rate: 27% (1K open / 6K opened)
[!!] table_definition_cache(400) is lower than number of tables(1882)
[OK] Open file limit used: 0% (16/32K)
[OK] Table locks acquired immediately: 100% (7K immediate / 7K locks)
-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 4 thread(s).
[--] Using default value is good enough for your version (10.5.8-MariaDB)
-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 18.2% (24M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/4.0K
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 3.0G/20.6G
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (3.125 %): 96.0M * 1/3.0G should be equal to 25%
[--] Number of InnoDB Buffer Pool Chunk : 24 for 1 Buffer Pool Instance(s)
[OK] Innodb_buffer_pool_size aligned with Innodb_buffer_pool_chunk_size & Innodb_buffer_pool_instances
[OK] InnoDB Read buffer efficiency: 99.99% (11893153063 hits/ 11894346836 total)
[!!] InnoDB Write Log efficiency: 21.89% (50454 hits/ 230456 total)
[OK] InnoDB log waits: 0.00% (0 waits / 280910 writes)
-------- Aria Metrics ------------------------------------------------------------------------------
[--] Aria Storage Engine is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/2.4M
[!!] Aria pagecache hit rate: 93.2% (1M cached / 77K reads)
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: MIXED
[--] XA support enabled: ON
[--] Semi synchronous replication Master: OFF
[--] Semi synchronous replication Slave: OFF
[--] This is a standalone server
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
MySQL was started within the last 24 hours - recommendations may be inaccurate
We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
See https://dev.mysql.com/doc/internals/en/join-buffer-size.html
(specially the conclusions at the bottom of the page).
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries which have no LIMIT clause
Performance schema should be activated for better diagnostics
Consider installing Sys schema from https://github.com/mysql/mysql-sys for MySQL
Consider installing Sys schema from https://github.com/FromDual/mariadb-sys for MariaDB
Before changing innodb_log_file_size and/or innodb_log_files_in_group read
Variables to adjust:
join_buffer_size (> 256.0K, or always use indexes with JOINs)
tmp_table_size (> 32M)
max_heap_table_size (> 32M)
table_definition_cache(400) > 1882 or -1 (autosizing if supported)
performance_schema = ON enable PFS
innodb_buffer_pool_size (>= 20.6G) if possible.
innodb_log_file_size should be (=768M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
=====================================
2021-02-08 13:58:40 0x7ff6b3d11700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 11 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 20876 srv_active, 0 srv_shutdown, 16149 srv_idle
srv_master_thread log flush and writes: 37025
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10455
OS WAIT ARRAY INFO: signal count 11626
RW-shared spins 2112, rounds 12450, OS waits 102
RW-excl spins 2416, rounds 5720, OS waits 122
RW-sx spins 146, rounds 1352, OS waits 17
Spin rounds per wait: 5.89 RW-shared, 2.37 RW-excl, 9.26 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 1707308132
Purge done for trx's n:o < 1707308131 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 422180334491048, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334538304, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334516824, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334534008, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334512528, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334503936, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334525416, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334508232, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334478160, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334529712, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334521120, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334499640, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334495344, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334486752, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334482456, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334473864, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334469568, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 422180334465272, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: (null) ((null))
I/O thread 1 state: (null) ((null))
I/O thread 2 state: (null) ((null))
I/O thread 3 state: (null) ((null))
I/O thread 4 state: (null) ((null))
I/O thread 5 state: (null) ((null))
I/O thread 6 state: (null) ((null))
I/O thread 7 state: (null) ((null))
I/O thread 8 state: (null) ((null))
I/O thread 9 state: (null) ((null))
Pending normal aio reads:
Pending flushes (fsync) log: 0; buffer pool: 0
1196036 OS file reads, 362093 OS file writes, 291004 OS fsyncs
0.27 reads/s, 16384 avg bytes/read, 6.73 writes/s, 6.73 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 817, free list len 41154, seg size 41972, 2293 merges
merged operations:
insert 9206, delete mark 105324, delete 196
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 111476.96 non-hash searches/s
---
LOG
---
Log sequence number 966315111123
Log flushed up to 966315111015
Pages flushed up to 966313325596
Last checkpoint at 966304251703
0 pending log flushes, 0 pending chkp writes
283193 log i/o's done, 6.73 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 3254779904
Dictionary memory allocated 30676992
Buffer pool size 193560
Free buffers 88
Database pages 193472
Old database pages 71422
Modified db pages 688
Percent of dirty pages(LRU & free pages): 0.355
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 2028651, not young 55580837
0.00 youngs/s, 0.55 non-youngs/s
Pages read 1194330, created 25139, written 78328
0.27 reads/s, 3.00 creates/s, 0.00 writes/s
Buffer pool hit rate 999 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 193472, unzip_LRU len: 0
I/O sum[1443]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 118319, updated 165195, deleted 169952, read 9369336877
3.64 inserts/s, 2.55 updates/s, 0.18 deletes/s, 172245.89 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 28810
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
答案1
好的,我想我跌倒关于问题....thread_cache_size
[OK] Thread cache hit rate: 99% (125 created / 295K connections)
默认thread_cache_size
设置为256
,在我们的例子中,MariaDB 处理两种类型的负载,来自应用程序和客户端的常规 SQL,以及偶尔的深度和复杂的分析查询。
看起来,由于当连接完成时,线程将被保留cache
以供将来使用,这导致线程不会释放其内存。
我不知道是否有其他方法可以指示 MariaDB 应该更频繁地回收其线程,但与此同时,降低thread_cache_size
固定的问题
答案2
运行脚本时,您的 MySQL 服务器仅运行了 30 分钟mysqltuner
。建议在查看mysqltuner
统计数据之前,让 MySQL 至少运行 24 小时。
MySQL 的最大内存使用量为 9.8 G,即可用 RAM 的 75%,因此 MySQL 本身在限制范围内。您需要查看是否有其他应用程序消耗了大量内存。
或者,您可以尝试减少可用连接数(max_connections = 300),因为您当前的使用率只有 39%(请注意,这只是 30 分钟后),但这些连接分配了超过一半的可能内存使用量(每个连接 18.9 M,300 个连接总计 5670 M)。