MySQL 服务器偶尔会崩溃

MySQL 服务器偶尔会崩溃

我遇到了一个令人困惑的问题,mysqld 偶尔会处于非常尴尬的状态,即 CPU 使用率非常高(几百个线程都在争夺 CPU 时间)并且内存快速累积(更不用说泄漏)。一旦这种情况开始发生,服务器在执行查询时就会变得非常慢,通常慢到足以触发等待它的应用程序超时。

最终,它会积累足够的内存,开始触发 Linux 内核 OOM-killer,此时整个系统已经因为不断的交换而变得非常慢,几乎无法使用,需要硬重置。为用户
设置更严格的内存分配ulimitmysql已经解决了这一特定症状,但潜在问题仍然存在。

一旦 MySQL 服务器处于此状态,关闭命令后需要几分钟才能真正停止,这也使得重新启动它令人紧张。尽管我仍然更喜欢尝试彻底关闭它,以避免破坏复制。

一旦服务器处于这种状态,慢查询日志就毫无用处了每一个查询会被记录为慢查询,因此很难找出任何实际的违规者。

将 mysql 服务器升级到版本5.6.43-1debian8已将问题从每隔几天发生一次减少到每隔几周发生一次。

SHOW FULL PROCESS LIST;在正在发生的情况中我尚未能够获得。

运行该程序的机器有 12 核/24 线程,64 GiB 内存(其中大部分内存可用于 MySQL)和 32 GiB 的 HDD 交换。

这是我my.cnf之后提出的所有建议(不确定这些更改是否已经解决了问题,我将随着时间的推移不断判断)。

#
# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
# 
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all mysql clients
# It has been reported that passwords should be enclosed with ticks/quotes
# escpecially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
# default-character-set=latin1

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
log_error       = /var/log/mysql/mysql_error.log
socket          = /var/run/mysqld/mysqld.sock
nice            = 0

[mysqld]
#
# * Basic Settings
#
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
#basedir                = /usr/local
#ledir          = /usr/local/bin
#datadir                = /var/lib/mysql
#datadir                = /ssd/mysql
datadir         = /mnt/vg0_lv0/mysql/
tmpdir          = /tmp
#language       = /usr/local/share/mysql/english
#open-files-limit = 32000
sql_mode        = NO_ENGINE_SUBSTITUTION

character-set-server=latin1
collation-server=latin1_swedish_ci

net_read_timeout = 120
skip-external-locking
skip-name-resolve
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address            = ::
#
# * Fine Tuning
#
# mysqld might use { key_buffer + (read_buffer_size + read_rnd_buffer_size + sort_buffer_size + join_buffer_size) * max_threads } bytes of memory
#max_threads            = 2048 # probably bound to max_connections
key_buffer              = 1536M # global limit
key_buffer_size         = 1536M # not sure which spelling is correct, use both

# as said above, these limits are per connection and NOT global
read_buffer_size        = 16M
read_rnd_buffer_size    = 16M
sort_buffer_size        = 16M
myisam_sort_buffer_size = 16M
join_buffer_size        = 20M

max_allowed_packet      = 128M
preload_buffer_size     = 384K
tmp_table_size          = 192M
max_heap_table_size     = 192M
thread_stack            = 256K
thread_cache_size       = 24

# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover         = BACKUP
max_connections        = 500
table_open_cache       = 3200
table_open_cache_instances = 2 # approx. half of cores used routinely
#table_definition_cache # automatically set to min(400+(table_open_cache/2),2000)
#thread_concurrency     = 48
#
# * Query Cache Configuration
#
# Defaults: 1M, 16M
#query_cache_limit      = 12M
#query_cache_size        = 2048M
#query_cache_type       = 1
#query_cache_min_res_unit       = 32K
# Query Cache disabled due to lock contention <https://www.percona.com/blog/2012/09/05/write-contentions-on-the-query-cache/>
query_cache_size        = 0
query_cache_type        = 0

# query_prealloc_size   = 64M

wait_timeout            = 3600
interactive_timeout     = 28800

#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
log_error               = /var/log/mysql/mysql_error.log
#
# Error logging goes to syslog due to /etc/mysql/conf.d/mysqld_safe_syslog.cnf.
#
# Here you can see queries with especially long duration
slow_query_log          = 1
slow_query_log_file     = /var/log/mysql/mysql-slow.log
long_query_time         = 10
#log-queries-not-using-indexes = 1 # logs all queries that filter without an index, not just bad JOINs (spammy)
#log_slow_queries       = /var/log/mysql/mysql-slow.log # old, invalid
#long_query_time = 2
#log-queries-not-using-indexes = 1
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.
server-id               = 3

log_bin                 = /mnt/vg0_lv0/mysql_binlog/mysql-bin.log
expire_logs_days        = 31
max_binlog_size         = 1073741824

#log_bin                        = /var/log/mysql/mysql-bin.log
#expire_logs_days       = 31
#max_binlog_size         = 8192M
#binlog_do_db           = include_database_name
#binlog_ignore_db       = include_database_name
#slave-net-timeout = 30
#master-retry-count = 99999
#slave-skip-errors=1062,1053
#slave-skip-errors=1062,1053,1050,1146,1051,1396,1060,1054

#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
innodb_buffer_pool_size         = 6144M
innodb_buffer_pool_instances    = 6 # only >1 if each instance gets at least 1GiB, i_buffer_pool_size will be divided by this
innodb_log_file_size            = 768M
innodb_file_per_table           = on
innodb_ft_cache_size            = 32M
innodb_ft_total_cache_size      = 640M
innodb_additional_mem_pool      = 32M
innodb_thread_concurrency       = 0 # 0=autodetect
innodb_read_io_threads          = 64 # 64=max
innodb_write_io_threads         = 64
innodb_log_buffer_size          = 8M

# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem



[mysqldump]
quick
quote-names
max_allowed_packet      = 32M

[mysql]
#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]
key_buffer              = 16M

#
# * IMPORTANT: Additional settings that can override those from this file!
#   The files must end with '.cnf', otherwise they'll be ignored.
#
!includedir /etc/mysql/conf.d/

根据此处的建议(已添加的评论已剪辑)得出的 mysqltuner 报告:

 >>  MySQLTuner 1.7.13 - Major Hayden <[email protected]>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering

[--] Skipped version check for MySQLTuner script
[OK] Logged in using credentials passed on the command line
[OK] Currently running supported MySQL version 5.6.44-log
[OK] Operating on 64-bit architecture

-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mysql/mysql_error.log(2M)
[OK] Log file /var/log/mysql/mysql_error.log exists
[OK] Log file /var/log/mysql/mysql_error.log is readable.
[OK] Log file /var/log/mysql/mysql_error.log is not empty
[OK] Log file /var/log/mysql/mysql_error.log is smaller than 32 Mb
[!!] /var/log/mysql/mysql_error.log contains 2285 warning(s).
    - Almost all of these are replication warnings about RAND()
[!!] /var/log/mysql/mysql_error.log contains 27 error(s).
    - Almost all of these are "Logging to '/var/log/mysql/mysql_error.log'." and similar.
[--] 18 start(s) detected in /var/log/mysql/mysql_error.log
[--] 1) 2019-05-09 20:27:44 34546 [Note] /usr/sbin/mysqld: ready for connections.
[--] 2) 2019-05-09 20:23:44 8281 [Note] /usr/sbin/mysqld: ready for connections.
[--] 3) 2019-05-09 20:22:38 39057 [Note] /usr/sbin/mysqld: ready for connections.
[--] 4) 2019-05-09 20:01:56 43432 [Note] /usr/sbin/mysqld: ready for connections.
[--] 5) 2019-05-09 19:55:54 22899 [Note] /usr/sbin/mysqld: ready for connections.
[--] 6) 2019-05-09 19:48:51 33825 [Note] /usr/sbin/mysqld: ready for connections.
[--] 7) 2019-05-09 19:26:38 20046 [Note] /usr/sbin/mysqld: ready for connections.
[--] 8) 2019-05-01 14:28:48 1915 [Note] /usr/sbin/mysqld: ready for connections.
[--] 9) 2019-04-15 19:50:40 9309 [Note] /usr/sbin/mysqld: ready for connections.
[--] 10) 2019-04-15 19:40:28 43854 [Note] /usr/sbin/mysqld: ready for connections.
[--] 16 shutdown(s) detected in /var/log/mysql/mysql_error.log
[--] 1) 2019-05-09 20:27:20 8281 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 2) 2019-05-09 20:23:20 39057 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 3) 2019-05-09 20:22:12 43432 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 4) 2019-05-09 20:01:32 22899 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 5) 2019-05-09 19:55:29 33825 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 6) 2019-05-09 19:48:21 20046 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 7) 2019-05-09 19:26:06 1915 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 8) 2019-05-01 14:28:01 9309 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 9) 2019-04-15 19:50:16 43854 [Note] /usr/sbin/mysqld: Shutdown complete
[--] 10) 2019-04-15 19:40:02 7435 [Note] /usr/sbin/mysqld: Shutdown complete

-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +BLACKHOLE +CSV -FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA 
[--] Data in InnoDB tables: 25.4G (Tables: 9193)
[--] Data in MyISAM tables: 6.1G (Tables: 4007)
[OK] Total fragmented tables: 0

-------- Analysis Performance Metrics --------------------------------------------------------------
[--] innodb_stats_on_metadata: OFF
[OK] No stat updates during querying INFORMATION_SCHEMA.

-------- Security Recommendations ------------------------------------------------------------------
[OK] There are no anonymous accounts for any database users
[OK] All database users have passwords assigned
[!!] User '*****@%' has user name as password.
    - A few more like these, removed for privacy.
[!!] User '*****@%' does not specify hostname restrictions.
    - Several dozen more like these, removed for privacy. Restrictions are implemented via iptables.
[--] There are 618 basic passwords in the list.
[!!] User '*****@localhost' is using weak password: 123456 in a lower, upper or capitalize derivative version.
[!!] User '*****@%' is using weak password: test in a lower, upper or capitalize derivative version.

-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION

-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 2d 21h 36m 2s (112M q [450.481 qps], 1M conn, TX: 225G, RX: 21G)
[--] Reads / Writes: 95% / 5%
[--] Binary logging is enabled (GTID MODE: OFF)
[--] Physical Memory     : 63.0G
[--] Max MySQL memory    : 41.5G
[--] Other process memory: 4.9G
[--] Total buffers: 7.7G global + 68.2M per thread (500 max threads)
[--] P_S Max memory usage: 487M
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 11.6G (18.42% of installed RAM)
[OK] Maximum possible memory usage: 41.5G (65.92% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (64K/112M)
[OK] Highest usage of available connections: 10% (51/500)
[OK] Aborted connections: 0.16%  (2506/1587046)
[OK] Query cache is disabled by default due to mutex contention on multiprocessor machines.
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 4M sorts)
[!!] Joins performed without indexes: 1703245
[OK] Temporary tables created on disk: 6% (998K on disk / 15M total)
[OK] Thread cache hit rate: 99% (64 created / 1M connections)
[!!] Table cache hit rate: 0% (3K open / 405K opened)
[OK] Open file limit used: 4% (1K/24K)
[OK] Table locks acquired immediately: 99% (142M immediate / 142M locks)
[OK] Binlog cache memory access: 99.99% (4902582 Memory / 4902885 Total)

-------- Performance schema ------------------------------------------------------------------------
[--] Memory used by P_S: 487.9M
[--] Sys schema isn't installed.

-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is disabled.

-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 22.7% (365M used / 1B cache)
[OK] Key buffer size / total MyISAM indexes: 1.5G/1.4G
[OK] Read Key buffer hit rate: 97.7% (1B cached / 37M reads)
[!!] Write Key buffer hit rate: 15.9% (454K cached / 72K writes)

-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[OK] InnoDB File per table is activated
[!!] InnoDB buffer pool / data size: 6.0G/25.4G
[OK] Ratio InnoDB log file size / InnoDB Buffer pool size: 768.0M * 2/6.0G should be equal 25%
[OK] InnoDB buffer pool instances: 6
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 100.00% (180155088749 hits/ 180158103035 total)
[!!] InnoDB Write Log efficiency: 37.14% (922821 hits/ 2484433 total)
[OK] InnoDB log waits: 0.00% (0 waits / 1561612 writes)

-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is disabled.

-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.

-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.

-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.

-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] This server is acting as master for 1 server(s).
[--] Binlog format: STATEMENT
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[--] No replication setup for this server or replication not started.

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Control warning line(s) into /var/log/mysql/mysql_error.log file
    Control error line(s) into /var/log/mysql/mysql_error.log file
    Set up a Secure Password for user@host ( SET PASSWORD FOR 'user'@'SpecificDNSorIp' = PASSWORD('secure_password'); )
    Restrict Host for user@% to user@SpecificDNSorIp
    4 user(s) used basic or weak password.
    Adjust your join queries to always utilize indexes
    Increase table_open_cache gradually to avoid file descriptor limits
    Read this before increasing table_open_cache over 64: [snip - bit.ly not allowed]
    Read this before increasing for MariaDB https://mariadb.com/kb/en/library/optimizing-table_open_cache/
    This is MyISAM only table_cache scalability problem, InnoDB not affected.
    See more details here: https://bugs.mysql.com/bug.php?id=49177
    This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
    Beware that open_files_limit (24000) variable 
    should be greater than table_open_cache (3200)
    Consider installing Sys schema from https://github.com/mysql/mysql-sys
Variables to adjust:
    join_buffer_size (> 20.0M, or always use indexes with JOINs)
    table_open_cache (> 3200)
    innodb_buffer_pool_size (>= 25.4G) if possible.

我可以采取什么步骤来尝试调试此问题虽然目前还没有发生? 我可以采取什么步骤来尝试解决该问题?

答案1

每秒速率 = RPS - 针对 my.cnf [mysqld] 部分的建议

key_cache_age_threshold=7200  # from 300 seconds to reduce key_reads RPS of 150
max_connections=128  # from 500 because max_used_connections < 60
innodb_io_capacity=1000  # from 200 to allow higher IOPS
read_rnd_buffer_size=256K  # from 16M to reduce handler_read_rnd_next RPS of 36978
table_definition_cache=10000  # from 2000 to reduce opened_table_definitions RPHr of 6,340

答案2

数百个线程都在争夺 CPU 时间

减少允许或尝试的连接数。

允许:max_connections从 512 更改为 40。这可能会避免出现问题。

已尝试:客户端是什么?它会产生大量线程吗?如果是,请减少尝试次数。

为什么?一旦惊群到达 MySQL,它所做的就是让延迟飙升,而不会提高吞吐量。事实上,吞吐量可能会下降向下当 CPU 过载时。可以将其视为“争夺 CPU”。

Mysqltuner 可以获取一些信息,但若要获取更多信息,请提供所要求的信息这里

而且,由于高 CPU 通常意味着查询效率低下(这可能会导致数百个线程),让我们修复其中一些问题。参见

相关内容