增加 MySQL 8 可以处理的连接数(Debian 9 服务器)并自动重启事件调度程序

增加 MySQL 8 可以处理的连接数(Debian 9 服务器)并自动重启事件调度程序

我有一台 Debian 9 服务器(12 个 CPU,80GB RAM),运行 Percona MySQL 8.0 服务器,每秒大约有 1100 个客户端,负载不是很高,大约在 0.3 到 2.30 左右

显示引擎 innodb 状态的最新数字:183.82 次插入/秒、169.69 次更新/秒、5.79 次删除/秒、2179444.29 次读取/秒

它大多数时候运行良好,但最近有时它突然拒绝新的连接(当一个非常繁忙的表被锁定很长时间或某些查询花费的时间太长时)。

日志显示:

[错误] [MY-000000] [connection_h] 错误日志节流阀:1912‘无法创建线程来处理新连接’错误被抑制

[错误] [MY-010249] [服务器] 无法创建线程来处理新连接(errno = 11)

[错误] [MY-010053] [服务器] Event_scheduler::execute_top: 无法创建事件工作线程 (errno=11)。停止事件调度程序


所以,基本上我有两个问题:

1. 如何调整服务器以接受更多连接(我尝试了很多方法,我将在下面列出),确切地说,它不能处理超过 4821 个连接

2. 每当发生这种情况时,事件调度程序就会停止,我必须手动再次启用它,这真的很糟糕,因为我有一些 MEMORY 表已满,它们处理大量流量并每 5 秒转储到其他表


到目前为止,我尝试设置软限制和硬限制/etc/security/limits.conf对于 *、mysql 和 root 用户:

*         hard    nofile      400000
*         soft    nofile      400000
*         hard    nproc       400000
*         soft    nproc       400000
mysql     hard    nofile      400000
mysql     soft    nofile      400000
root      hard    nofile      400000
root      soft    nofile      400000

我也做了很多调整/etc/sysctl.conf下面列出一些:

kernel.pid_max = 262144
vm.max_map_count = 262144
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 32768
fs.file-max = 450000
net.core.netdev_max_backlog = 450000
net.core.somaxconn = 32768

我还对 systemd conf 文件进行了更改:/lib/systemd/system/mysql.service- 环境:

LimitNOFILE=220000
TasksMax=32768

mysqld.conf

bind-address                    = 0.0.0.0

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/run/mysqld/mysqld.sock
pid-file                       = /var/lib/mysql/mysql.pid

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
skip-name-resolve
sysdate-is-now                 = 1
innodb                         = FORCE

wait-timeout                   = 600

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
sync-binlog                    = 0

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
max-connections                = 20000
thread-cache-size              = 300
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 4096

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 0
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 64G
innodb-fast-shutdown           = 0
innodb-buffer-pool-dump-pct    = 75
innodb-buffer-pool-dump-at-shutdown = 1
innodb-buffer-pool-load-at-startup  = 1
innodb-io-capacity             = 400
innodb-io-capacity-max         = 2000

# LOGGING #
log-error                      = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes  = 0
slow-query-log                 = 1
slow-query-log-file            = /var/log/mysql/mysql-slow.log
long-query-time                = 5

event_scheduler = 1
general_log_file               = /var/log/mysql/general.log
general_log                    = 0
local-infile                   = 1

我不知道我还能做什么,或者我还能在哪里找到,试图寻找问题,但找不到任何有用的方法,对于事件调度程序来说,情况更糟,几乎没有任何关于它的信息(试着在谷歌上搜索MY-010053或者无法创建事件工作线程几乎没有结果)

服务器不会变得无响应或缓慢,如果我故意锁定表,我会在大约 10 秒内开始收到错误,而当我解锁时它会立即恢复

正如 Wilson Hauck 所要求的:

ulimit -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 326193
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 400000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 326193
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

iostat -xm 5 3

Linux 4.9.0-8-amd64 (zelda)     03/07/2019      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.24    0.00    1.56    0.76    0.00   86.44

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.41    44.17    2.41   67.61     0.05     1.61    48.55     0.37    5.23    4.74    5.25   1.37   9.59

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.77    0.00    1.91    0.82    0.00   87.51

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    52.20    0.00   81.60     0.00     1.85    46.51     0.45    5.51    0.00    5.51   1.14   9.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.83    0.00    1.41    0.77    0.00   91.99

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   132.00    0.00  161.40     0.00     8.12   103.00     0.88    5.47    0.00    5.47   0.66  10.72

显示完整进程列表没什么特别的:

4087783 event_scheduler localhost       Daemon  1   Waiting for next activation     0   0
275050068   xgh 179.191.66.174:51143    xgh Sleep   3           0   0
275050130   xgh 179.191.66.174:51144    xgh Query   0   starting    show full processlist   0   0
300324788   xgh 179.191.66.174:40708    xgh Sleep   12595           5   61
304505269   xgh 179.191.66.174:51680    xgh Sleep   72          0   0
304505986   xgh 179.191.66.174:51706    xgh Sleep   72          0   0
305818676   xgh 172.30.5.2:57288    xgh Query   0   Sending data    SELECT *
 FROM (`noticias`.`noticia`)
 WHERE `texto` =  'Carlos Ghosn deixa prisão após mais de 100 dias detido em Tóquio'    0   0
305818680   xgh 172.30.5.2:57296    xgh Sleep   57          0   0
305818682   xgh 172.30.5.2:57302    xgh Sleep   57          0   0
305818689   xgh 172.30.5.2:57316    xgh Sleep   57          0   0
305818692   xgh 172.30.5.2:57324    xgh Sleep   57          0   0
305842475   xgh 172.30.5.2:49326    xgh Sleep   1           94  550698
305842479   xgh 172.30.5.2:49334    xgh Sleep   1           0   0
305842481   xgh 172.30.5.2:49340    xgh Sleep   1           0   0
305842486   xgh 172.30.5.2:49350    xgh Sleep   1           0   0
305842489   xgh 172.30.5.2:49358    xgh Sleep   1           0   0
305842492   xgh 172.30.5.2:49364    xgh Sleep   1           0   0
305842496   xgh 172.30.5.2:49372    xgh Sleep   1           0   0
305842498   xgh 172.30.5.2:49376    xgh Sleep   1           0   0
305842501   xgh 172.30.5.2:49382    xgh Sleep   1           0   0
305842505   xgh 172.30.5.2:49392    xgh Sleep   1           0   0
305842508   xgh 172.30.5.2:49398    xgh Sleep   1           0   0

显示全球状态:https://pastebin.com/cg1v5bHD

显示全局变量:https://pastebin.com/8mkTzpr0

答案1

请提供SHOW CREATE TABLE noticia。如果texto是类型TEXT,则运行查询的那个查询需要PROCESSLIST表扫描。如果是VARCHAR,那么索引可能会有所帮助。

您声称有很多连接,但我只看到一个在做任何事情。所以SHOW FULL PROCESSLIST再次希望得到几个查询。

答案2

每秒速率 = RPS - 针对 my.cnf [mysqld] 部分的建议

innodb_lru_scan_depth=100  # from 1024 to conserve 90% of CPU cycles used for function
innodb_flushing_avg_loops=5  # from 30 to reduce delay and reduce innodb_buffer_pool_pages_dirty of 90,000 +
innodb_io_capacity=1900  # from 400 to allow higher IOPS to data devices
read_rnd_buffer_size=128K  # from 256K to reduce handler_read_rnd_next RPS of ~ 600,000 RPS

使用 SHOW GLOBAL STATUS LIKE '%dirty%' 监控 innodb_buffer_pool_pages_dirty;

需要审查的时间消费者,A)com_rollback_to_savepoint AVG 每 62 秒 1 次,B)handler_rollback AVG 每 52 秒 1 次,C)handler_savepoint_rollback AVG 每 31 秒 1 次。

免责声明:我是 mysqlservertuning.com 的作者,如我的个人资料“网络资料”中所述。祝您的实例平均每秒有 367 个连接。

相关内容