我们有一台运行 Linux 5.15 的服务器,并且我们已多次验证过这样的情况:某个进程被 OOM 杀死,导致整个系统在网络上无法访问,无论是入站流量还是出站流量。这是该事件的最新系统日志跟踪:
Mar 8 05:16:01 ip-10-110-10-133 kernel: [203986.004138] amazon-cloudwat invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004146] CPU: 3 PID: 1627 Comm: amazon-cloudwat Not tainted 5.15.0-1031-aws #35~20.04.1-Ubuntu
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004149] Hardware name: Amazon EC2 r6i.2xlarge/, BIOS 1.0 10/16/2017
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004150] Call Trace:
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004152] <TASK>
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004155] dump_stack_lvl+0x4a/0x63
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004162] dump_stack+0x10/0x16
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004164] dump_header+0x53/0x225
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004168] oom_kill_process.cold+0xb/0x10
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004171] out_of_memory+0x1dc/0x530
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004174] __alloc_pages_slowpath.constprop.0+0xd32/0xe30
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004179] ? __alloc_pages_slowpath.constprop.0+0xdb6/0xe30
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004182] __alloc_pages+0x2cc/0x310
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004185] alloc_pages+0x90/0x120
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004187] __page_cache_alloc+0x87/0xc0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004191] pagecache_get_page+0x150/0x530
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004193] ? page_cache_ra_unbounded+0x16a/0x220
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004196] filemap_fault+0x527/0xb60
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004198] ? filemap_map_pages+0x138/0x640
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004201] __do_fault+0x3d/0x120
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004205] do_fault+0x1f9/0x420
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004207] __handle_mm_fault+0x62c/0x840
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004210] handle_mm_fault+0xd8/0x2c0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004211] do_user_addr_fault+0x1c2/0x660
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004215] exc_page_fault+0x77/0x170
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004218] asm_exc_page_fault+0x27/0x30
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004221] RIP: 0033:0x44c1a0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004235] Code: Unable to access opcode bytes at RIP 0x44c176.
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004235] RSP: 002b:000000c001705de8 EFLAGS: 00010246
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004238] RAX: 000000c001705f78 RBX: 000000c001705e8c RCX: 0000000000000000
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004240] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004241] RBP: 000000c001705fb8 R08: 0000000000000001 R09: 000000c00063bb30
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004242] R10: 000000c001705f00 R11: 000000c000b715c0 R12: 000000c000d73080
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004243] R13: ffffffffffffffff R14: 000000c000bec820 R15: 0000000000000000
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004246] </TASK>
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004247] Mem-Info:
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] active_anon:202 inactive_anon:15952689 isolated_anon:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] active_file:146 inactive_file:0 isolated_file:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] unevictable:6279 dirty:3 writeback:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] slab_reclaimable:9465 slab_unreclaimable:13692
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] mapped:68073 shmem:256 pagetables:33680 bounce:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] kernel_misc_reclaimable:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] free:91381 free_pcp:1688 free_cma:0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004253] Node 0 active_anon:808kB inactive_anon:63810756kB active_file:584kB inactive_file:0kB unevictable:25116kB isolated(anon):0kB isolated(file):0kB mapped:272292kB dirty:12kB writeback:0kB shmem:1024kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:5536kB pagetables:134720kB all_unreclaimable? no
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004257] Node 0 DMA free:11264kB min:16kB low:28kB high:40kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004262] lowmem_reserve[]: 0 2991 63273 63273 63273
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004266] Node 0 DMA32 free:244188kB min:3188kB low:6244kB high:9300kB reserved_highatomic:0KB active_anon:0kB inactive_anon:2804896kB active_file:0kB inactive_file:544kB unevictable:0kB writepending:0kB present:3129252kB managed:3063716kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:112kB free_cma:0kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004270] lowmem_reserve[]: 0 0 60281 60281 60281
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004273] Node 0 Normal free:110072kB min:115576kB low:177292kB high:239008kB reserved_highatomic:2048KB active_anon:808kB inactive_anon:61005860kB active_file:1364kB inactive_file:388kB unevictable:25116kB writepending:12kB present:62898176kB managed:61728412kB mlocked:18340kB bounce:0kB free_pcp:6084kB local_pcp:928kB free_cma:0kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004278] lowmem_reserve[]: 0 0 0 0 0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004281] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004292] Node 0 DMA32: 188*4kB (UME) 127*8kB (UME) 135*16kB (UME) 69*32kB (UME) 33*64kB (UME) 12*128kB (UME) 8*256kB (UME) 2*512kB (UE) 2*1024kB (UM) 2*2048kB (ME) 55*4096kB (M) = 244280kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004306] Node 0 Normal: 9199*4kB (UME) 5601*8kB (UME) 1449*16kB (UMEH) 193*32kB (UMEH) 7*64kB (MH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 111412kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004318] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004320] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004321] 4359 total pagecache pages
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004322] 0 pages in swap cache
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004323] Swap cache stats: add 0, delete 0, find 0/0
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004324] Free swap = 0kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004325] Total swap = 0kB
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004326] 16510855 pages RAM
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004326] 0 pages HighMem/MovableOnly
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004327] 308983 pages reserved
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004328] 0 pages hwpoisoned
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004328] Tasks state (memory values in pages):
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004329] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004337] [ 211] 0 211 106563 1109 811008 0 -250 systemd-journal
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004340] [ 248] 0 248 2247 979 61440 0 -1000 systemd-udevd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004343] [ 340] 0 340 53652 4488 94208 0 -1000 multipathd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004346] [ 360] 0 360 701 29 45056 0 0 falcond
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004348] [ 361] 0 361 696911 147512 1540096 0 0 falcon-sensor-b
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004350] [ 370] 0 370 2841 375 45056 0 -1000 auditd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004352] [ 441] 100 441 6680 1009 77824 0 0 systemd-network
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004355] [ 446] 101 446 6001 1658 86016 0 0 systemd-resolve
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004357] [ 534] 0 534 60348 354 102400 0 0 accounts-daemon
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004359] [ 535] 0 535 637 165 40960 0 0 acpid
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004361] [ 536] 0 536 200300 13159 385024 0 0 amazon-cloudwat
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004363] [ 540] 103 540 1920 952 53248 0 -900 dbus-daemon
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004365] [ 568] 0 568 20476 612 61440 0 0 irqbalance
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004367] [ 570] 113 570 3256 342 53248 0 0 chronyd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004369] [ 577] 0 577 7494 2846 98304 0 0 networkd-dispat
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004370] [ 580] 113 580 1210 439 53248 0 0 chronyd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004372] [ 603] 0 603 2168 585 57344 0 0 cron
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004375] [ 609] 0 609 59107 779 94208 0 0 polkitd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004377] [ 617] 104 617 56125 866 86016 0 0 rsyslogd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004379] [ 621] 0 621 12231 6333 126976 0 0 salt-minion
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004381] [ 628] 0 628 4307 1002 69632 0 0 systemd-logind
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004382] [ 632] 0 632 98669 829 131072 0 0 udisksd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004384] [ 636] 0 636 951 499 49152 0 0 atd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004386] [ 677] 0 677 1840 447 53248 0 0 agetty
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004388] [ 680] 0 680 4561 533 61440 0 0 wrapper
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004389] [ 700] 0 700 3047 932 61440 0 -1000 sshd
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004391] [ 735] 0 735 60152 1054 102400 0 0 ModemManager
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004393] [ 736] 0 736 1459 385 49152 0 0 agetty
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004395] [ 737] 0 737 27031 2719 110592 0 0 unattended-upgr
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004397] [ 857] 0 857 1461116 107789 1564672 0 0 java
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004399] [ 928] 0 928 248205 12742 299008 0 0 salt-minion
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004401] [ 1041] 0 1041 31500 6490 143360 0 0 salt-minion
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004403] [ 1082] 0 1082 9519 580 69632 0 0 master
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004405] [ 1084] 112 1084 9670 566 65536 0 0 qmgr
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004407] [ 3599] 112 3599 10536 724 69632 0 0 tlsmgr
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004409] [ 97274] 0 97274 307044 307 163840 0 0 newrelic-infra-
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004411] [ 97282] 0 97282 440719 4508 294912 0 0 newrelic-infra
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004413] [ 251487] 112 251487 9585 121 65536 0 0 pickup
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004416] [ 257287] 0 257287 2553 624 57344 0 0 cron
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004418] [ 257288] 0 257288 2553 624 57344 0 0 cron
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004419] [ 257289] 0 257289 2553 623 57344 0 0 cron
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004421] [ 257290] 0 257290 2553 623 57344 0 0 cron
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004423] [ 257292] 3001 257292 2189 118 49152 0 0 bash
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004425] [ 257293] 3001 257293 2189 115 57344 0 0 bash
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004427] [ 257294] 3001 257294 2189 119 49152 0 0 bash
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004429] [ 257296] 3001 257296 2189 118 57344 0 0 bash
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004431] [ 257307] 3001 257307 656 29 40960 0 0 run-one
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004433] [ 257308] 3001 257308 656 29 40960 0 0 run-one
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004434] [ 257309] 3001 257309 656 29 40960 0 0 run-one
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004436] [ 257310] 3001 257310 656 29 40960 0 0 run-one
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004438] [ 257344] 3001 257344 1859 24 53248 0 0 flock
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004440] [ 257346] 3001 257346 7911996 7731555 62894080 0 0 python
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004442] [ 257347] 3001 257347 1859 24 61440 0 0 flock
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004443] [ 257348] 3001 257348 1859 24 49152 0 0 flock
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004445] [ 257349] 3001 257349 338380 158223 2191360 0 0 python
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004447] [ 257351] 3001 257351 365119 184973 2453504 0 0 python
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004449] [ 257355] 3001 257355 1859 25 53248 0 0 flock
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004451] [ 257356] 3001 257356 7883801 7703167 62664704 0 0 python
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004452] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=python,pid=257346,uid=3001
Mar 8 05:16:02 ip-10-110-10-133 kernel: [203986.004468] Out of memory: Killed process 257346 (python) total-vm:31647984kB, anon-rss:30926220kB, file-rss:0kB, shmem-rss:0kB, UID:3001 pgtables:61420kB oom_score_adj:0
Mar 8 05:17:01 ip-10-110-10-133 CRON[258623]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar 8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar 8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar 8 05:23:03 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:23:03Z" level=warning msg="commands poll failed" component=CommandChannelService error="command request submission failed: Get \"https://infrastructure-command-api.newrelic.com/agent_commands/v1/commands\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Mar 8 05:27:24 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:27:24Z" level=warning msg="error occurred while updating the system fingerprint" component=Agent error="unable to fetch AWS metadata: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": dial tcp 169.254.169.254:80: i/o timeout"
Mar 8 05:29:29 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:27:24Z" level=warning msg="commands poll failed" component=CommandChannelService error="command request submission failed: Get \"https://infrastructure-command-api.newrelic.com/agent_commands/v1/commands\": dial tcp 162.247.242.49:443: i/o timeout (Client.Timeout exceeded while awaiting headers)"
Mar 8 05:34:38 ip-10-110-10-133 systemd[1]: Starting Ubuntu Advantage Timer for running repeated jobs...
Mar 8 06:01:58 ip-10-110-10-133 systemd[1]: collector.service: Main process exited, code=exited, status=1/FAILURE
Mar 8 06:07:08 ip-10-110-10-133 systemd[1]: collector.service: Failed with result 'exit-code'.
Mar 8 06:09:58 ip-10-110-10-133 systemd-networkd[441]: ens5: Could not set DHCPv4 address: Connection timed out
Mar 8 06:12:04 ip-10-110-10-133 systemd-networkd[441]: ens5: Failed
OOM 发生后,很多服务开始出现故障,看起来不仅仅是 DNS 的问题。为什么会出现这种情况?被终止的进程难道不应该是唯一一个在 OOM 中受到影响的进程吗?
此外,令人费解的是,smtpd
在事件发生后,以某种方式在转发电子邮件时会打印一条成功消息——我不确定这是否是转移注意力的手段,但所有其他服务在 OOM 之后都会报告网络错误。当然,重新启动可以解决所有问题。
答案1
我不相信 OOM 会终止一些 Python 进程造成IPv4 问题。尤其是一些可能不相关的问题,比如从 cron 启动的某些 Python 脚本。
另一种解释是,这些是不同故障的症状。为什么在 OOM 发生几分钟后,SMTP 消息仍然能够成功中继?
在这样的内存压力下,您的系统性能可能会非常糟糕。糟糕到 IP 连接失败并且服务无法有效工作,这种情况非常罕见,但确实存在。查看可用的任何性能指标。对于具有如此多 CPU 的机器,平均负载数字应该不会太疯狂。内存压力失速信息非常有价值,可以证明是否有任何时间花在内存上。正在运行的进程表明您正在使用 AWS CloudWwatch 或 New Relic,也请使用这些应用程序或主机指标。
另一方面,强制终止某些任务对系统稳定性和应用程序正确性不利。即使系统通常恢复,终止错误的应用程序或守护进程或代理的后果也可能很严重。Linux 虚拟内存系统会尽力避免 OOM 终止,这是最后的手段之一。
如果此盒子上的工作负载合理的话,看看容量规划,内存可能不够。