发现耗尽 Ubuntu 内存和 CPU 限制的进程

发现耗尽 Ubuntu 内存和 CPU 限制的进程

我正在运行 DigitalOcean droplet Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-89-generic x86_64)。我的网站大约在 12 月 1 日星期五上午 6:48(格林威治标准时间)瘫痪。从var/log/apache2/access.log,我发现网站瘫痪前的最后一条条目是:

45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/331/et-core-unified-tb-508-tb-516-deferred-331.min.css?ver=1701308913 HTTP/1.1" 200 7247 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML
, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/25/et-core-unified-25.min.css?ver=1701337684 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0
.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/25/et-core-unified-tb-13-tb-74-deferred-25.min.css?ver=1701337684 HTTP/1.1" 200 6148 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/24899/et-core-unified-24899.min.css?ver=1701308552 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrom
e/87.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/331/et-core-unified-331.min.css?ver=1701308912 HTTP/1.1" 200 5295 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87
.0.4280.88 Safari/537.36"
45.135.232.36 - - [01/Dec/2023:06:47:54 +0000] "GET /wp-content/et-cache/23/et-core-unified-tb-13-tb-74-deferred-23.min.css?ver=1701336351 HTTP/1.1" 200 6144 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, li
ke Gecko) Chrome/87.0.4280.88 Safari/537.36"

我原本希望找到导致问题的原因的证据var/log/apache2/error.log。令我惊讶的是,我找到的只有这些:

[Fri Dec 01 00:00:01.001724 2023] [mpm_prefork:notice] [pid 851] AH00163: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 configured -- resuming normal operations
[Fri Dec 01 00:00:01.001885 2023] [core:notice] [pid 851] AH00094: Command line: '/usr/sbin/apache2'
[Fri Dec 01 12:02:26.984774 2023] [mpm_prefork:notice] [pid 853] AH00163: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 configured -- resuming normal operations
[Fri Dec 01 12:02:27.071107 2023] [core:notice] [pid 853] AH00094: Command line: '/usr/sbin/apache2'

格林威治标准时间/协调世界时 (GMT/UTC) 12 月 1 日星期五上午 6:48 左右没有发生任何事情。我的服务器使用格林威治标准时间/协调世界时 (GMT/UTC)(以及我上面复制的所有日志)。

我想知道问题是否不是发送到错误日志的任何东西,而是我的 RAM 和 CPU 可能已经耗尽,达到极限并导致服务器瘫痪。特别是,我知道 RAM 已达到 100%:

在此处输入图片描述

CPU、负载、磁盘 I/O 和带宽也同时受到异常峰值的影响(大约在 12 月 1 日星期五上午 6:48 至 6:50 GMT/UTC),此时网站出现故障:

在此处输入图片描述

在此处输入图片描述

在此处输入图片描述

在此处输入图片描述

我知道在 12 月 1 日星期五上午 6:48 至 6:50 GMT/UTC 的两分钟内发生了一些事情,服务器资源消耗/性能就是明证。但是 Apache 日志/var/log/apache2/缺少当时发生情况的信息。我想要做的是找出在 12 月 1 日星期五上午 6:48 GMT 期间哪些进程消耗的资源百分比最高,因为我知道 RAM 内存已达到其容量的 100%。我希望获得类似于我们在 Windows 上可以使用 Ctrl + Alt + Sup 执行的操作,然后在任务管理器下单击进程选项卡。单击内存列时,Windows 会按 RAM 内存消耗对进程进行排序,从高到低列出。例如,从我的计算机中我可以看出,在单独竞争时,Google Chrome 消耗的 RAM 比任何其他应用程序都多:

在此处输入图片描述

当然,问题是我无法回到过去并按进程排序监控 Ubuntu 服务器上的 RAM 内存消耗,以查看 12 月 1 日星期五上午 6:48 至 6:50 GMT/UTC 期间发生的情况。所以我需要从日志中发现这些信息,但var/log/apache2/似乎没有提供任何有用的信息。事实上,它错过了当时发生的错误信息。有没有办法按进程监控 Ubuntu 上的 RAM 和/或 CPU 消耗?或者,我如何在 Apache 日志中找到可能导致我的网站瘫痪的原因?

更新 1:通过使用cat syslog |grep "Dec 1 06:48"(来自/var/log/),我发现了 12 月 1 日 06:48 GMT/UTC 发生的事情:

Dec  1 06:48:11 jaimemontoya kernel: [67708.613307] [UFW BLOCK] IN=eth0 OUT= MAC=92:df:ec:15:65:70:fe:00:00:00:01:01:08:00 SRC=89.248.163.205 DST=143.198.36.10 LEN=40 TOS=0x00 PREC=0x00 TTL=239 ID=35961 PROTO=TCP SPT=59246 DPT=3644 WINDOW=1024 RES=0x00 SYN URGP=0
Dec  1 06:48:17 jaimemontoya kernel: [67713.912771] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec  1 06:48:17 jaimemontoya kernel: [67713.912798] CPU: 0 PID: 1 Comm: systemd Not tainted 5.15.0-89-generic #99-Ubuntu
Dec  1 06:48:17 jaimemontoya kernel: [67713.912806] Hardware name: DigitalOcean Droplet/Droplet, BIOS 20171212 12/12/2017
Dec  1 06:48:17 jaimemontoya kernel: [67713.912818] Call Trace:
Dec  1 06:48:17 jaimemontoya kernel: [67713.912822]  <TASK>
Dec  1 06:48:17 jaimemontoya kernel: [67713.912876]  show_stack+0x52/0x5c
...........
Dec  1 06:48:17 jaimemontoya kernel: [67713.913069]  asm_exc_page_fault+0x27/0x30
Dec  1 06:48:17 jaimemontoya kernel: [67713.913075] RIP: 0033:0x7f2280df1040
Dec  1 06:48:17 jaimemontoya kernel: [67713.913091] Code: Unable to access opcode bytes at RIP 0x7f2280df1016.
Dec  1 06:48:17 jaimemontoya kernel: [67713.913092] RSP: 002b:00007ffe4d14fa18 EFLAGS: 00010202
Dec  1 06:48:17 jaimemontoya kernel: [67713.913097] RAX: 0000000000000000 RBX: 000055a43072c670 RCX: 000055a4308ed840
Dec  1 06:48:17 jaimemontoya kernel: [67713.913099] RDX: 0000000000000000 RSI: 000055a42ff1dfe1 RDI: 000055a43072c670
Dec  1 06:48:17 jaimemontoya kernel: [67713.913100] RBP: 0000000000000000 R08: 0000000000000000 R09: dd020cb76a33553c
Dec  1 06:48:17 jaimemontoya kernel: [67713.913101] R10: 83e2e07d8b22b2e2 R11: 00007ffe4d1ed090 R12: 000055a42ff1dfe1
Dec  1 06:48:17 jaimemontoya kernel: [67713.913102] R13: 0000000000000000 R14: 0000000000061824 R15: 0000000000000000
Dec  1 06:48:17 jaimemontoya kernel: [67713.913109]  </TASK>
Dec  1 06:48:17 jaimemontoya kernel: [67713.913113] Mem-Info:
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118] active_anon:19496 inactive_anon:176522 isolated_anon:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  active_file:25 inactive_file:294 isolated_file:33
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  unevictable:7678 dirty:0 writeback:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  slab_reclaimable:5762 slab_unreclaimable:11722
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  mapped:24147 shmem:24761 pagetables:3898 bounce:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  kernel_misc_reclaimable:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913118]  free:12123 free_pcp:224 free_cma:0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913123] Node 0 active_anon:77984kB inactive_anon:706088kB active_file:100kB inactive_file:1176kB unevictable:30712kB isolated(anon):0kB isolated(file):132kB mapped:96588kB dirty:0kB writeback:0kB shmem:99044kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3152kB pagetables:15592kB all_unreclaimable? no
Dec  1 06:48:17 jaimemontoya kernel: [67713.913133] Node 0 DMA free:4344kB min:732kB low:912kB high:1092kB reserved_highatomic:0KB active_anon:4kB inactive_anon:10456kB active_file:12kB inactive_file:32kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913144] lowmem_reserve[]: 0 903 903 903 903
Dec  1 06:48:17 jaimemontoya kernel: [67713.913147] Node 0 DMA32 free:44148kB min:44320kB low:55400kB high:66480kB reserved_highatomic:0KB active_anon:77980kB inactive_anon:695632kB active_file:88kB inactive_file:1148kB unevictable:30712kB writepending:0kB present:1032032kB managed:965124kB mlocked:27640kB bounce:0kB free_pcp:896kB local_pcp:896kB free_cma:0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913152] lowmem_reserve[]: 0 0 0 0 0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913155] Node 0 DMA: 14*4kB (M) 8*8kB (M) 4*16kB (UM) 8*32kB (U) 5*64kB (U) 4*128kB (UM) 2*256kB (U) 3*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 4344kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913166] Node 0 DMA32: 209*4kB (UE) 254*8kB (ME) 136*16kB (UME) 152*32kB (UME) 75*64kB (ME) 38*128kB (ME) 16*256kB (UME) 14*512kB (ME) 13*1024kB (UM) 0*2048kB 0*4096kB = 44148kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913186] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913191] 27388 total pagecache pages
Dec  1 06:48:17 jaimemontoya kernel: [67713.913192] 0 pages in swap cache
Dec  1 06:48:17 jaimemontoya kernel: [67713.913195] Swap cache stats: add 0, delete 0, find 0/0
Dec  1 06:48:17 jaimemontoya kernel: [67713.913196] Free swap  = 0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913197] Total swap = 0kB
Dec  1 06:48:17 jaimemontoya kernel: [67713.913197] 262006 pages RAM
Dec  1 06:48:17 jaimemontoya kernel: [67713.913198] 0 pages HighMem/MovableOnly
Dec  1 06:48:17 jaimemontoya kernel: [67713.913198] 16885 pages reserved
Dec  1 06:48:17 jaimemontoya kernel: [67713.913199] 0 pages hwpoisoned
Dec  1 06:48:17 jaimemontoya kernel: [67713.913200] Tasks state (memory values in pages):
Dec  1 06:48:17 jaimemontoya kernel: [67713.913200] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Dec  1 06:48:17 jaimemontoya kernel: [67713.913205] [    339]     0   339    34126      874   290816        0          -250 systemd-journal
...........
Dec  1 06:48:38 jaimemontoya kernel: [67735.153143] [  19981]     0 19981      411       26    32768        0             0 sshd
Dec  1 06:48:38 jaimemontoya kernel: [67735.153146] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/apache2.service,task=apache2,pid=14813,uid=1000
Dec  1 06:48:38 jaimemontoya kernel: [67735.153172] Out of memory: Killed process 14813 (apache2) total-vm:358052kB, anon-rss:38456kB, file-rss:3128kB, shmem-rss:76672kB, UID:1000 pgtables:440kB oom_score_adj:0
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Control process exited, code=killed, status=15/TERM
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 851 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 13197 (apache2) with signal SIGKILL.
...........
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 19950 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 19968 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Failed with result 'oom-kill'.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process 13197 (apache2) remains running after unit stopped.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process ...........
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Unit process 19968 (apache2) remains running after unit stopped.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Consumed 18min 58.770s CPU time.
Dec  1 06:48:39 jaimemontoya systemd[1]: Started MySQL Community Server.
Dec  1 06:48:58 jaimemontoya kernel: [67755.616034] [UFW BLOCK] IN=eth0 OUT= MAC=92:df:ec:15:65:70:fe:00:00:00:01:01:08:00 SRC=167.94.146.79 DST=143.198.36.10 LEN=44 TOS=0x00 PREC=0x00 TTL=40 ID=53055 PROTO=TCP SPT=63693 DPT=2762 WINDOW=1024 RES=0x00 SYN URGP=0

更新2:检查了 UPDATE 1 的日志后,我发现了根本原因:

Dec  1 06:48:38 jaimemontoya kernel: [67735.153172] Out of memory: Killed process 14813 (apache2) total-vm:358052kB, anon-rss:38456kB, file-rss:3128kB, shmem-rss:76672kB, UID:1000 pgtables:440kB oom_score_adj:0
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Control process exited, code=killed, status=15/TERM
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: A process of this unit has been killed by the OOM killer.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 851 (apache2) with signal SIGKILL.
Dec  1 06:48:38 jaimemontoya systemd[1]: apache2.service: Killing process 13197 (apache2) with signal SIGKILL.

现在我需要找到一种方法来了解哪个过程是造成这种情况的原因Out of memory: Killed process 14813 (apache2)

相关内容