调试 Mariadb 随机崩溃

调试 Mariadb 随机崩溃

这些是journalctl -u mariadb -f崩溃发生时过去几天的日志:

Mar 27 14:25:35 new-server.novalocal mariadbd[30060]: 2023-03-27 14:25:35 40923 [Warning] Access denied for user 'root'@'182.160.117.92' (using password: YES)
Mar 27 14:25:36 new-server.novalocal mariadbd[30060]: 2023-03-27 14:25:36 40924 [Warning] Access denied for user 'root'@'182.160.117.92' (using password: YES)
Mar 27 14:25:37 new-server.novalocal mariadbd[30060]: 2023-03-27 14:25:37 40925 [Warning] Access denied for user 'root'@'182.160.117.92' (using password: YES)
Mar 27 14:25:38 new-server.novalocal mariadbd[30060]: 2023-03-27 14:25:38 40926 [Warning] Access denied for user 'root'@'182.160.117.92' (using password: YES)
Mar 27 14:56:50 new-server.novalocal mariadbd[30060]: 2023-03-27 14:56:50 41477 [Warning] Hostname 'zg-1220c-69.stretchoid.com' does not resolve to '192.241.234.8'.
Mar 27 14:56:50 new-server.novalocal mariadbd[30060]: 2023-03-27 14:56:50 41477 [Note] Hostname 'zg-1220c-69.stretchoid.com' has the following IP addresses:
Mar 27 14:56:50 new-server.novalocal mariadbd[30060]: 2023-03-27 14:56:50 41477 [Note]  - 127.0.0.1
Mar 27 14:56:50 new-server.novalocal mariadbd[30060]: 2023-03-27 14:56:50 41477 [Warning] Aborted connection 41477 to db: 'unconnected' user: 'unauthenticated' host: '192.241.234.8' (This connection closed normally without aut
Mar 27 15:14:38 new-server.novalocal systemd[1]: mariadb.service: main process exited, code=killed, status=9/KILL
Mar 27 15:14:38 new-server.novalocal systemd[1]: Unit mariadb.service entered failed state.
Mar 27 15:14:38 new-server.novalocal systemd[1]: mariadb.service failed.
Mar 27 15:14:43 new-server.novalocal systemd[1]: mariadb.service holdoff time over, scheduling restart.
Mar 27 15:14:43 new-server.novalocal systemd[1]: Stopped MariaDB 10.6.4 database server.
Mar 27 15:14:43 new-server.novalocal systemd[1]: Starting MariaDB 10.6.4 database server...
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] Using unique option prefix 'character-set-client' is error-prone and can break in the future. Please use the full name 'character-set-client-ha
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Warning] /usr/sbin/mariadbd: ignoring option '--character-set-client-handshake' due to invalid value 'utf8mb4'
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] Using unique option prefix 'key_buffer' is error-prone and can break in the future. Please use the full name 'key_buffer_size' instead.
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] /usr/sbin/mariadbd (server 10.6.4-MariaDB) starting as process 18423 ...
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Number of pools: 1
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Using Linux native AIO
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Initializing buffer pool, total size = 8589934592, chunk size = 134217728
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Completed initialization of buffer pool
Mar 27 15:14:44 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=549981281636,549981636145
Mar 27 15:14:51 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:51 0 [Note] InnoDB: Starting final batch to recover 3777 pages from redo log.
Mar 27 15:14:55 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:55 0 [Note] InnoDB: 128 rollback segments are active.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: Creating shared tablespace for temporary tables
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: 10.6.4 started; log sequence number 550168329917; transaction id 2374116794
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] Plugin 'FEEDBACK' is disabled.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] Server socket created on IP: '0.0.0.0'.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] Server socket created on IP: '::'.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: 2023-03-27 15:14:56 0 [Note] /usr/sbin/mariadbd: ready for connections.
Mar 27 15:14:56 new-server.novalocal mariadbd[18423]: Version: '10.6.4-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
Mar 27 15:14:56 new-server.novalocal systemd[1]: Started MariaDB 10.6.4 database server.
Mar 27 15:15:04 new-server.novalocal mariadbd[18423]: 2023-03-27 15:15:04 12 [Warning] IP address '45.84.208.80' could not be resolved: Name or service not known
Mar 27 15:25:12 new-server.novalocal mariadbd[18423]: 2023-03-27 15:25:12 0 [Note] InnoDB: Buffer pool(s) load completed at 230327 15:25:12
Mar 27 15:25:34 new-server.novalocal mariadbd[18423]: 2023-03-27 15:25:34 403 [ERROR] mariadbd: Table './abac/abac_logger' is marked as crashed and should be repaired
Mar 27 15:25:34 new-server.novalocal mariadbd[18423]: 2023-03-27 15:25:34 403 [Warning] Checking table:   './abac/abac_logger'
Mar 27 16:38:30 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:30 3625 [Warning] Aborted connection 3625 to db: 'unconnected' user: 'unauthenticated' host: 'nat.en.net.ua' (This connection closed normally without authe
Mar 27 16:38:30 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:30 3626 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: NO)
Mar 27 16:38:30 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:30 3627 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)
Mar 27 16:38:31 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:31 3629 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)
Mar 27 16:38:31 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:31 3630 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)
Mar 27 16:38:31 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:31 3631 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)
Mar 27 16:38:31 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:31 3632 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)
Mar 27 16:38:31 new-server.novalocal mariadbd[18423]: 2023-03-27 16:38:31 3633 [Warning] Access denied for user 'root'@'nat.en.net.ua' (using password: YES)

附加信息:

  • 服务器版本:10.6.4-MariaDB MariaDB服务器
  • CentOS Linux 版本 7.9.2009(核心)
  • 16GB RAM,以及约 9GB Innodb 大小(很快将增加两者)

关于导致崩溃的原因有任何反馈吗?

更新 1:感谢 @AlexD 的评论,找到了原因

dmesg -T | egrep -i 'killed process'
[Sun Mar 26 00:25:56 2023] Killed process 22107 (mariadbd), UID 27, total-vm:11617328kB, anon-rss:8841236kB, file-rss:0kB, shmem-rss:0kB
[Mon Mar 27 15:05:47 2023] Killed process 30060 (mariadbd), UID 27, total-vm:11726556kB, anon-rss:8919200kB, file-rss:0kB, shmem-rss:0kB

并且:

dmesg -T | less
[Fri Feb 24 11:55:34 2023] Out of memory: Kill process 28666 (mariadbd) score 630 or sacrifice child
[Fri Feb 24 11:55:34 2023] Killed process 28666 (mariadbd), UID 27, total-vm:12708960kB, anon-rss:10227836kB, file-rss:0kB, shmem-rss:0kB
[Fri Feb 24 12:46:34 2023] php-fpm[22177]: segfault at 7fa2bb08ca31 ip 0000559801a1e551 sp 00007fff0fca6610 error 4 in php-fpm[5598017bd000+3a6000]
[Fri Feb 24 13:43:02 2023] php-fpm[13587]: segfault at 7fa29c026e31 ip 0000559801a1e551 sp 00007fff0fca6aa0 error 4 in php-fpm[5598017bd000+3a6000]
[Sat Feb 25 00:51:50 2023] sh (11581): drop_caches: 3
[Sun Feb 26 00:51:48 2023] sh (10278): drop_caches: 3
[Mon Feb 27 00:51:48 2023] sh (1543): drop_caches: 3
[Tue Feb 28 00:51:46 2023] sh (29036): drop_caches: 3
[Tue Feb 28 08:56:36 2023] mariadbd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Tue Feb 28 08:56:36 2023] mariadbd cpuset=/ mems_allowed=0
[Tue Feb 28 08:56:36 2023] CPU: 2 PID: 9536 Comm: mariadbd Kdump: loaded Not tainted 3.10.0-1160.36.2.el7.x86_64 #1
[Tue Feb 28 08:56:36 2023] Hardware name: RDO OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014
[Tue Feb 28 08:56:36 2023] Call Trace:
[Tue Feb 28 08:56:36 2023]  [<ffffffff86183559>] dump_stack+0x19/0x1b
[Tue Feb 28 08:56:36 2023]  [<ffffffff8617e5f8>] dump_header+0x90/0x229
[Tue Feb 28 08:56:36 2023]  [<ffffffff85d0f18b>] ? cred_has_capability+0x6b/0x120
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bc252d>] oom_kill_process+0x2cd/0x490
[Tue Feb 28 08:56:36 2023]  [<ffffffff85d0f25c>] ? selinux_capable+0x1c/0x40
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bc2c1a>] out_of_memory+0x31a/0x500
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bc9834>] __alloc_pages_nodemask+0xad4/0xbe0
[Tue Feb 28 08:56:36 2023]  [<ffffffff85c19388>] alloc_pages_current+0x98/0x110
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bbdfe7>] __page_cache_alloc+0x97/0xb0
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bc0f80>] filemap_fault+0x270/0x420
[Tue Feb 28 08:56:36 2023]  [<ffffffffc046391e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
[Tue Feb 28 08:56:36 2023]  [<ffffffffc0463b1c>] xfs_filemap_fault+0x2c/0x30 [xfs]
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bee78a>] __do_fault.isra.61+0x8a/0x100
[Tue Feb 28 08:56:36 2023]  [<ffffffff85beed3c>] do_read_fault.isra.63+0x4c/0x1b0
[Tue Feb 28 08:56:36 2023]  [<ffffffff85bf6580>] handle_mm_fault+0xa20/0xfb0
[Tue Feb 28 08:56:36 2023]  [<ffffffff86190653>] __do_page_fault+0x213/0x500
[Tue Feb 28 08:56:36 2023]  [<ffffffff86190a26>] trace_do_page_fault+0x56/0x150
[Tue Feb 28 08:56:36 2023]  [<ffffffff8618ffa2>] do_async_page_fault+0x22/0xf0
[Tue Feb 28 08:56:36 2023]  [<ffffffff8618c7a8>] async_page_fault+0x28/0x30
[Tue Feb 28 08:56:36 2023] Mem-Info:
[Tue Feb 28 08:56:36 2023] active_anon:3700632 inactive_anon:232131 isolated_anon:0
 active_file:2112 inactive_file:2402 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:7681 slab_unreclaimable:19877
 mapped:57615 shmem:259768 pagetables:20522 bounce:0
 free:34106 free_pcp:271 free_cma:0
[Tue Feb 28 08:56:36 2023] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Tue Feb 28 08:56:36 2023] lowmem_reserve[]: 0 2812 15866 15866

答案1

回答我自己的问题并通过评论找到解决方案:

@AlexD: SIGKILL indicates that it is most probably killed by OOM killer.

基于此,如果您对 OOM Killer 进行更多研究,您会发现可以通过运行以下命令找到已终止的系统进程:

dmesg -T | egrep -i 'killed process'

还可以使用以下方法浏览完整日志:

dmesg -T | less

这样就找到了原因,目前的最终解决方案是增加服务器 RAM,因为根据 Innodb 的大小,操作系统没有足够的剩余 RAM。

相关内容