*** 在 mariadb 10.3.11 中检测到缓冲区溢出 ***

*** 在 mariadb 10.3.11 中检测到缓冲区溢出 ***

每次 MySQL 的内存使用量达到整个系统内存的上限时,我们都会收到断言失败的消息。

每次断言失败都会在代码位置“/home/buildbot/buildbot/build/mariadb-10.3.11/storage”中报告不同的文件,但每次我们都会看到相同的检测到缓冲区溢出mysqld 日志中的语句。

我们只看到“检测到缓冲区溢出”,就像是一种模式。我已附加堆栈跟踪以及 my.cnf 文件。

我们的服务器硬件配置如下,Ubuntu 16.04.5 LTS 内存:72G 核心:36 我们的场景:写入 50k 作为批量插入,我们插入大约超过 100G 的数据。

我们有几个嫌疑人,但仍然无法确定问题出在哪里。1. 我们正在使用“重复键更新”写入大约 100 条插入 - 是否可以将所有索引作为页面(旧/年轻页面)放回到内存中并且它没有被清除 2. 我试图插入相同的 50k 条插入,大约有 200 万条记录,我发现内存峰值大约为 2%(基于数据大小),每次我运行它时,它都会被加起来。

重要提示:当我们看到这个问题时,表数据也被破坏了。这是我们非常担心的问题。

我的cnf

[mysqld]

!includedir /etc/mysql/conf.d/

server-id       = 1
pid-file        = /data/db/datadir/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
bind-address    = 0.0.0.0

binlog_format   = ROW
binlog_row_image= minimal

log-bin
log-basename    = rs-bin
expire_logs_days = 1

datadir         = /data/db/datadir/
tmpdir          = /data/db/mysql_tmp
log-error       = /var/log/mysql/mysqld.log

#user           = mysql
###basedir      = /usr

#innodb_force_recovery = 1

symbolic-links=0

log_slave_updates
sync_binlog=0
log_bin_trust_function_creators=1
slow_query_log=ON

transaction-isolation=READ-UNCOMMITTED

max_connections                 = 300
max_allowed_packet              = 100M
net_buffer_length               = 4M
net_read_timeout                = 300
net_write_timeout               = 360
query_cache_type                = 0
sort_buffer_size                = 200M
key_buffer_size                 = 512M
tmp_table_size                  = 1073741824
max_sort_length                 = 2048
max_heap_table_size             = 1G
explicit_defaults_for_timestamp = 1

aria-pagecache-buffer-size      = 3G
#aria-block-size                = 4096

innodb_data_file_path           = ibdata1:12M:autoextend
innodb_buffer_pool_size         = 40G
innodb_buffer_pool_instances    = 40
innodb_log_file_size            = 1G
innodb_log_buffer_size          = 128M
innodb_log_compressed_pages     = off
innodb_flush_method             = O_DIRECT
innodb_flush_log_at_trx_commit  = 0
innodb_file_per_table           = ON
innodb_thread_sleep_delay       = 10000
innodb_thread_concurrency       = 0
innodb_write_io_threads         = 64 #2x of cores
innodb_read_io_threads          = 64 #2x of cores
innodb_log_files_in_group       = 2
innodb_lock_wait_timeout        = 50
innodb_io_capacity              = 10000# benchmark and change
innodb_io_capacity_max          = 20000
innodb_sort_buffer_size         = 10M  #perthread buffer
innodb_spin_wait_delay          = 3  #half the standard value
innodb_old_blocks_pct           = 20
innodb_old_blocks_time          = 1000
innodb_page_cleaners            = 64
innodb_print_all_deadlocks      = ON
innodb_purge_threads            = 1
innodb_flush_neighbors          = 0 #good for ssds
innodb_flush_sync               = true
innodb_use_native_aio           = true
innodb_random_read_ahead        = true
innodb_checksum_algorithm       = crc32

#need to tune this based on performance of DML queries, max of 5000 https://dev.mysql.com/doc/refman/8.0/en/innodb-improved-purge-scheduling.html
innodb_purge_batch_size         = 2000

#Parameters synced from RDS
back_log                        = 900
###binlog_annotate_row_events   = OFF
binlog_checksum                 = NONE
general_log                     = OFF
general_log_file                = /var/log/mysql-general.log
group_concat_max_len            = 1024
host_cache_size                 = 1114
innodb_lock_schedule_algorithm  = fcfs
innodb_max_dirty_pages_pct_lwm  = 0.001

innodb_strict_mode              = OFF
innodb_use_atomic_writes        = OFF
read_buffer_size                = 10485760
read_rnd_buffer_size            = 52428800
relay_log_recovery              = ON
open_files_limit                = 65535

#innodb_purge_threads           = 1
#innodb_spin_wait_delay         = 6
#sql_mode                       = 'NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
###thread_cache_size            = 0
###thread_pool_max_threads      = 1000
###thread_pool_size             = 16
#optimizer_switch               = 'orderby_uses_equalities=off,condition_pushdown_for_derived=off,split_materialized=off'


# A setting smaller than the default is generally suitable for most workloads. A value that is much higher than necessary may impact performance. Only consider increasing the value if you have spare I/O capacity under a typical workload. Conversely, if a write-intensive workload saturates your I/O capacity, decrease the value, especially in the case of a large buffer pool.

innodb_lru_scan_depth           = 2048

###innodb_monitor_enable        = 'all'
#innodb_page_size               = 4096 #same as amazon i3 blocksize for optimized ssd writes
#innodb_thread_sleep_delay_max  = 20000


[mysqld_safe]
log-error=/var/log/mysql/mysqld.log
pid-file=/data/db/datadir/mysqld.pid

堆栈跟踪:

2019-05-06 07:35:08 0x7f171e37a700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.11/storage/innobase/btr/btr0sea.cc line 1415
InnoDB: Failing assertion: index->id == btr_page_get_index_id(page)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
InnoDB: about forcing recovery.
190506  7:35:08 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.3.11-MariaDB-1:10.3.11+maria~xenial-log
key_buffer_size=536870912
read_buffer_size=10485760
max_used_connections=42
max_threads=302
thread_count=46
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 65473079 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f17100009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f171e379b78 thread_stack 0x49000
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f2282f887e5]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f228302a15c]
/lib/x86_64-linux-gnu/libc.so.6(+0x117160)[0x7f2283028160]
/lib/x86_64-linux-gnu/libc.so.6(+0x1190a7)[0x7f228302a0a7]
/usr/sbin/mysqld(my_addr_resolve+0xde)[0x5646f87a6c6e]
/usr/sbin/mysqld(my_print_stacktrace+0x1e2)[0x5646f878b092]
/usr/sbin/mysqld(handle_fatal_signal+0x347)[0x5646f8239687]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f2283977390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f2282f46428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f2282f4802a]
/usr/sbin/mysqld(+0x49980a)[0x5646f7f8080a]
/usr/sbin/mysqld(+0xb1b2a0)[0x5646f86022a0]
/usr/sbin/mysqld(+0xb1baee)[0x5646f8602aee]
/usr/sbin/mysqld(+0xb111ef)[0x5646f85f81ef]
/usr/sbin/mysqld(+0xa76fe2)[0x5646f855dfe2]
/usr/sbin/mysqld(+0xa790d1)[0x5646f85600d1]
/usr/sbin/mysqld(+0xa733e7)[0x5646f855a3e7]
/usr/sbin/mysqld(+0xa761ac)[0x5646f855d1ac]
/usr/sbin/mysqld(+0xa7699d)[0x5646f855d99d]
/usr/sbin/mysqld(+0xa30ef9)[0x5646f8517ef9]
/usr/sbin/mysqld(+0xabec24)[0x5646f85a5c24]
/usr/sbin/mysqld(+0xaa0b26)[0x5646f8587b26]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f228396d6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f228301841d]
======= Memory map: ========
5646f7ae7000-5646f8d49000 r-xp 00000000 103:03 51127                     /usr/sbin/mysqld
5646f8f49000-5646f9039000 r--p 01262000 103:03 51127                     /usr/sbin/mysqld
5646f9039000-5646f90f2000 rw-p 01352000 103:03 51127                     /usr/sbin/mysqld
5646f90f2000-5646f9989000 rw-p 00000000 00:00 0 
5646fa098000-5647440b3000 rw-p 00000000 00:00 0                          [heap]
7f0fe0000000-7f0fe3f39000 rw-p 00000000 00:00 0 
7f0fe3f39000-7f0fe4000000 ---p 00000000 00:00 0 
7f0fe8000000-7f0fe8021000 rw-p 00000000 00:00 0 ...

答案1

抱歉更新延迟,这是由于连接未正确关闭造成的。

[编辑] 介绍一下与 MySQL 连接的应用程序的背景(我没有在我的问题中添加它)。

我们正在使用 Apache Airflow,它每秒都会与每个数据库建立一次连接。

同时,我们也遇到了这个问题,我们看到了下面的日志,

2019-05-20 13:07:14 3071 [Warning] Aborted connection 3071 to db: 'unconnected' user: 'root' host: 'ip-112-12-114-119.us-gov-west-1.compute.internal-1' (Got an error reading communication packets)

正如我们提到的那样,Warning我们忽略它并继续调试buffer overflow issue

由于没有找到继续前进的方法,我们想到根据日志解决现有问题并逐一解决。

重要提示:我们发现我们没有关闭 Airflow 中的连接,我们自己在 Airflow 之外建立了连接以测试数据库可用性,但我们忘记关闭它。然后我们进行了此代码更改并部署和测试了大约 48-72 小时,我们没有发现此警告日志以及缓冲区溢出问题。

相关内容