我有一台 Debian 9 服务器(12 个 CPU,80GB RAM),运行 Percona MySQL 8.0 服务器,每秒大约有 1100 个客户端,负载不是很高,大约在 0.3 到 2.30 左右
显示引擎 innodb 状态的最新数字:183.82 次插入/秒、169.69 次更新/秒、5.79 次删除/秒、2179444.29 次读取/秒
它大多数时候运行良好,但最近有时它突然拒绝新的连接(当一个非常繁忙的表被锁定很长时间或某些查询花费的时间太长时)。
日志显示:
[错误] [MY-000000] [connection_h] 错误日志节流阀:1912‘无法创建线程来处理新连接’错误被抑制
[错误] [MY-010249] [服务器] 无法创建线程来处理新连接(errno = 11)
[错误] [MY-010053] [服务器] Event_scheduler::execute_top: 无法创建事件工作线程 (errno=11)。停止事件调度程序
所以,基本上我有两个问题:
1. 如何调整服务器以接受更多连接(我尝试了很多方法,我将在下面列出),确切地说,它不能处理超过 4821 个连接
2. 每当发生这种情况时,事件调度程序就会停止,我必须手动再次启用它,这真的很糟糕,因为我有一些 MEMORY 表已满,它们处理大量流量并每 5 秒转储到其他表
到目前为止,我尝试设置软限制和硬限制/etc/security/limits.conf对于 *、mysql 和 root 用户:
* hard nofile 400000
* soft nofile 400000
* hard nproc 400000
* soft nproc 400000
mysql hard nofile 400000
mysql soft nofile 400000
root hard nofile 400000
root soft nofile 400000
我也做了很多调整/etc/sysctl.conf下面列出一些:
kernel.pid_max = 262144
vm.max_map_count = 262144
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.ip_local_port_range = 2000 65500
net.ipv4.tcp_max_syn_backlog = 32768
fs.file-max = 450000
net.core.netdev_max_backlog = 450000
net.core.somaxconn = 32768
我还对 systemd conf 文件进行了更改:/lib/systemd/system/mysql.service- 环境:
LimitNOFILE=220000
TasksMax=32768
mysqld.conf
bind-address = 0.0.0.0
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/lib/mysql/mysql.pid
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
skip-name-resolve
sysdate-is-now = 1
innodb = FORCE
wait-timeout = 600
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
sync-binlog = 0
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
max-connections = 20000
thread-cache-size = 300
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 0
innodb-file-per-table = 1
innodb-buffer-pool-size = 64G
innodb-fast-shutdown = 0
innodb-buffer-pool-dump-pct = 75
innodb-buffer-pool-dump-at-shutdown = 1
innodb-buffer-pool-load-at-startup = 1
innodb-io-capacity = 400
innodb-io-capacity-max = 2000
# LOGGING #
log-error = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 1
slow-query-log-file = /var/log/mysql/mysql-slow.log
long-query-time = 5
event_scheduler = 1
general_log_file = /var/log/mysql/general.log
general_log = 0
local-infile = 1
我不知道我还能做什么,或者我还能在哪里找到,试图寻找问题,但找不到任何有用的方法,对于事件调度程序来说,情况更糟,几乎没有任何关于它的信息(试着在谷歌上搜索MY-010053或者无法创建事件工作线程几乎没有结果)
服务器不会变得无响应或缓慢,如果我故意锁定表,我会在大约 10 秒内开始收到错误,而当我解锁时它会立即恢复
正如 Wilson Hauck 所要求的:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 326193
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 400000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 326193
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
iostat -xm 5 3
Linux 4.9.0-8-amd64 (zelda) 03/07/2019 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
11.24 0.00 1.56 0.76 0.00 86.44
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.41 44.17 2.41 67.61 0.05 1.61 48.55 0.37 5.23 4.74 5.25 1.37 9.59
avg-cpu: %user %nice %system %iowait %steal %idle
9.77 0.00 1.91 0.82 0.00 87.51
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 52.20 0.00 81.60 0.00 1.85 46.51 0.45 5.51 0.00 5.51 1.14 9.28
avg-cpu: %user %nice %system %iowait %steal %idle
5.83 0.00 1.41 0.77 0.00 91.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 132.00 0.00 161.40 0.00 8.12 103.00 0.88 5.47 0.00 5.47 0.66 10.72
显示完整进程列表没什么特别的:
4087783 event_scheduler localhost Daemon 1 Waiting for next activation 0 0
275050068 xgh 179.191.66.174:51143 xgh Sleep 3 0 0
275050130 xgh 179.191.66.174:51144 xgh Query 0 starting show full processlist 0 0
300324788 xgh 179.191.66.174:40708 xgh Sleep 12595 5 61
304505269 xgh 179.191.66.174:51680 xgh Sleep 72 0 0
304505986 xgh 179.191.66.174:51706 xgh Sleep 72 0 0
305818676 xgh 172.30.5.2:57288 xgh Query 0 Sending data SELECT *
FROM (`noticias`.`noticia`)
WHERE `texto` = 'Carlos Ghosn deixa prisão após mais de 100 dias detido em Tóquio' 0 0
305818680 xgh 172.30.5.2:57296 xgh Sleep 57 0 0
305818682 xgh 172.30.5.2:57302 xgh Sleep 57 0 0
305818689 xgh 172.30.5.2:57316 xgh Sleep 57 0 0
305818692 xgh 172.30.5.2:57324 xgh Sleep 57 0 0
305842475 xgh 172.30.5.2:49326 xgh Sleep 1 94 550698
305842479 xgh 172.30.5.2:49334 xgh Sleep 1 0 0
305842481 xgh 172.30.5.2:49340 xgh Sleep 1 0 0
305842486 xgh 172.30.5.2:49350 xgh Sleep 1 0 0
305842489 xgh 172.30.5.2:49358 xgh Sleep 1 0 0
305842492 xgh 172.30.5.2:49364 xgh Sleep 1 0 0
305842496 xgh 172.30.5.2:49372 xgh Sleep 1 0 0
305842498 xgh 172.30.5.2:49376 xgh Sleep 1 0 0
305842501 xgh 172.30.5.2:49382 xgh Sleep 1 0 0
305842505 xgh 172.30.5.2:49392 xgh Sleep 1 0 0
305842508 xgh 172.30.5.2:49398 xgh Sleep 1 0 0
答案1
请提供SHOW CREATE TABLE noticia
。如果texto
是类型TEXT
,则运行查询的那个查询需要PROCESSLIST
表扫描。如果是VARCHAR
,那么索引可能会有所帮助。
您声称有很多连接,但我只看到一个在做任何事情。所以SHOW FULL PROCESSLIST
再次希望得到几个查询。
答案2
每秒速率 = RPS - 针对 my.cnf [mysqld] 部分的建议
innodb_lru_scan_depth=100 # from 1024 to conserve 90% of CPU cycles used for function
innodb_flushing_avg_loops=5 # from 30 to reduce delay and reduce innodb_buffer_pool_pages_dirty of 90,000 +
innodb_io_capacity=1900 # from 400 to allow higher IOPS to data devices
read_rnd_buffer_size=128K # from 256K to reduce handler_read_rnd_next RPS of ~ 600,000 RPS
使用 SHOW GLOBAL STATUS LIKE '%dirty%' 监控 innodb_buffer_pool_pages_dirty;
需要审查的时间消费者,A)com_rollback_to_savepoint AVG 每 62 秒 1 次,B)handler_rollback AVG 每 52 秒 1 次,C)handler_savepoint_rollback AVG 每 31 秒 1 次。
免责声明:我是 mysqlservertuning.com 的作者,如我的个人资料“网络资料”中所述。祝您的实例平均每秒有 367 个连接。