Linux 服务器在发生 OOM 事件后失去网络连接

Linux 服务器在发生 OOM 事件后失去网络连接

我们有一台运行 Linux 5.15 的服务器,并且我们已多次验证过这样的情况:某个进程被 OOM 杀死,导致整个系统在网络上无法访问,无论是入站流量还是出站流量。这是该事件的最新系统日志跟踪:

Mar  8 05:16:01 ip-10-110-10-133 kernel: [203986.004138] amazon-cloudwat invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004146] CPU: 3 PID: 1627 Comm: amazon-cloudwat Not tainted 5.15.0-1031-aws #35~20.04.1-Ubuntu
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004149] Hardware name: Amazon EC2 r6i.2xlarge/, BIOS 1.0 10/16/2017
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004150] Call Trace:
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004152]  <TASK>
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004155]  dump_stack_lvl+0x4a/0x63
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004162]  dump_stack+0x10/0x16
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004164]  dump_header+0x53/0x225
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004168]  oom_kill_process.cold+0xb/0x10
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004171]  out_of_memory+0x1dc/0x530
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004174]  __alloc_pages_slowpath.constprop.0+0xd32/0xe30
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004179]  ? __alloc_pages_slowpath.constprop.0+0xdb6/0xe30
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004182]  __alloc_pages+0x2cc/0x310
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004185]  alloc_pages+0x90/0x120
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004187]  __page_cache_alloc+0x87/0xc0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004191]  pagecache_get_page+0x150/0x530
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004193]  ? page_cache_ra_unbounded+0x16a/0x220
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004196]  filemap_fault+0x527/0xb60
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004198]  ? filemap_map_pages+0x138/0x640
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004201]  __do_fault+0x3d/0x120
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004205]  do_fault+0x1f9/0x420
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004207]  __handle_mm_fault+0x62c/0x840
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004210]  handle_mm_fault+0xd8/0x2c0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004211]  do_user_addr_fault+0x1c2/0x660
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004215]  exc_page_fault+0x77/0x170
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004218]  asm_exc_page_fault+0x27/0x30
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004221] RIP: 0033:0x44c1a0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004235] Code: Unable to access opcode bytes at RIP 0x44c176.
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004235] RSP: 002b:000000c001705de8 EFLAGS: 00010246
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004238] RAX: 000000c001705f78 RBX: 000000c001705e8c RCX: 0000000000000000
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004240] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000000
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004241] RBP: 000000c001705fb8 R08: 0000000000000001 R09: 000000c00063bb30
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004242] R10: 000000c001705f00 R11: 000000c000b715c0 R12: 000000c000d73080
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004243] R13: ffffffffffffffff R14: 000000c000bec820 R15: 0000000000000000
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004246]  </TASK>
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004247] Mem-Info:
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248] active_anon:202 inactive_anon:15952689 isolated_anon:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  active_file:146 inactive_file:0 isolated_file:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  unevictable:6279 dirty:3 writeback:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  slab_reclaimable:9465 slab_unreclaimable:13692
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  mapped:68073 shmem:256 pagetables:33680 bounce:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  kernel_misc_reclaimable:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004248]  free:91381 free_pcp:1688 free_cma:0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004253] Node 0 active_anon:808kB inactive_anon:63810756kB active_file:584kB inactive_file:0kB unevictable:25116kB isolated(anon):0kB isolated(file):0kB mapped:272292kB dirty:12kB writeback:0kB shmem:1024kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:5536kB pagetables:134720kB all_unreclaimable? no
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004257] Node 0 DMA free:11264kB min:16kB low:28kB high:40kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004262] lowmem_reserve[]: 0 2991 63273 63273 63273
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004266] Node 0 DMA32 free:244188kB min:3188kB low:6244kB high:9300kB reserved_highatomic:0KB active_anon:0kB inactive_anon:2804896kB active_file:0kB inactive_file:544kB unevictable:0kB writepending:0kB present:3129252kB managed:3063716kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:112kB free_cma:0kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004270] lowmem_reserve[]: 0 0 60281 60281 60281
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004273] Node 0 Normal free:110072kB min:115576kB low:177292kB high:239008kB reserved_highatomic:2048KB active_anon:808kB inactive_anon:61005860kB active_file:1364kB inactive_file:388kB unevictable:25116kB writepending:12kB present:62898176kB managed:61728412kB mlocked:18340kB bounce:0kB free_pcp:6084kB local_pcp:928kB free_cma:0kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004278] lowmem_reserve[]: 0 0 0 0 0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004281] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004292] Node 0 DMA32: 188*4kB (UME) 127*8kB (UME) 135*16kB (UME) 69*32kB (UME) 33*64kB (UME) 12*128kB (UME) 8*256kB (UME) 2*512kB (UE) 2*1024kB (UM) 2*2048kB (ME) 55*4096kB (M) = 244280kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004306] Node 0 Normal: 9199*4kB (UME) 5601*8kB (UME) 1449*16kB (UMEH) 193*32kB (UMEH) 7*64kB (MH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 111412kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004318] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004320] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004321] 4359 total pagecache pages
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004322] 0 pages in swap cache
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004323] Swap cache stats: add 0, delete 0, find 0/0
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004324] Free swap  = 0kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004325] Total swap = 0kB
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004326] 16510855 pages RAM
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004326] 0 pages HighMem/MovableOnly
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004327] 308983 pages reserved
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004328] 0 pages hwpoisoned
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004328] Tasks state (memory values in pages):
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004329] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004337] [    211]     0   211   106563     1109   811008        0          -250 systemd-journal
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004340] [    248]     0   248     2247      979    61440        0         -1000 systemd-udevd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004343] [    340]     0   340    53652     4488    94208        0         -1000 multipathd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004346] [    360]     0   360      701       29    45056        0             0 falcond
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004348] [    361]     0   361   696911   147512  1540096        0             0 falcon-sensor-b
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004350] [    370]     0   370     2841      375    45056        0         -1000 auditd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004352] [    441]   100   441     6680     1009    77824        0             0 systemd-network
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004355] [    446]   101   446     6001     1658    86016        0             0 systemd-resolve
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004357] [    534]     0   534    60348      354   102400        0             0 accounts-daemon
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004359] [    535]     0   535      637      165    40960        0             0 acpid
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004361] [    536]     0   536   200300    13159   385024        0             0 amazon-cloudwat
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004363] [    540]   103   540     1920      952    53248        0          -900 dbus-daemon
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004365] [    568]     0   568    20476      612    61440        0             0 irqbalance
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004367] [    570]   113   570     3256      342    53248        0             0 chronyd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004369] [    577]     0   577     7494     2846    98304        0             0 networkd-dispat
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004370] [    580]   113   580     1210      439    53248        0             0 chronyd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004372] [    603]     0   603     2168      585    57344        0             0 cron
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004375] [    609]     0   609    59107      779    94208        0             0 polkitd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004377] [    617]   104   617    56125      866    86016        0             0 rsyslogd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004379] [    621]     0   621    12231     6333   126976        0             0 salt-minion
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004381] [    628]     0   628     4307     1002    69632        0             0 systemd-logind
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004382] [    632]     0   632    98669      829   131072        0             0 udisksd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004384] [    636]     0   636      951      499    49152        0             0 atd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004386] [    677]     0   677     1840      447    53248        0             0 agetty
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004388] [    680]     0   680     4561      533    61440        0             0 wrapper
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004389] [    700]     0   700     3047      932    61440        0         -1000 sshd
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004391] [    735]     0   735    60152     1054   102400        0             0 ModemManager
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004393] [    736]     0   736     1459      385    49152        0             0 agetty
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004395] [    737]     0   737    27031     2719   110592        0             0 unattended-upgr
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004397] [    857]     0   857  1461116   107789  1564672        0             0 java
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004399] [    928]     0   928   248205    12742   299008        0             0 salt-minion
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004401] [   1041]     0  1041    31500     6490   143360        0             0 salt-minion
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004403] [   1082]     0  1082     9519      580    69632        0             0 master
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004405] [   1084]   112  1084     9670      566    65536        0             0 qmgr
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004407] [   3599]   112  3599    10536      724    69632        0             0 tlsmgr
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004409] [  97274]     0 97274   307044      307   163840        0             0 newrelic-infra-
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004411] [  97282]     0 97282   440719     4508   294912        0             0 newrelic-infra
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004413] [ 251487]   112 251487     9585      121    65536        0             0 pickup
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004416] [ 257287]     0 257287     2553      624    57344        0             0 cron
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004418] [ 257288]     0 257288     2553      624    57344        0             0 cron
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004419] [ 257289]     0 257289     2553      623    57344        0             0 cron
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004421] [ 257290]     0 257290     2553      623    57344        0             0 cron
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004423] [ 257292]  3001 257292     2189      118    49152        0             0 bash
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004425] [ 257293]  3001 257293     2189      115    57344        0             0 bash
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004427] [ 257294]  3001 257294     2189      119    49152        0             0 bash
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004429] [ 257296]  3001 257296     2189      118    57344        0             0 bash
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004431] [ 257307]  3001 257307      656       29    40960        0             0 run-one
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004433] [ 257308]  3001 257308      656       29    40960        0             0 run-one
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004434] [ 257309]  3001 257309      656       29    40960        0             0 run-one
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004436] [ 257310]  3001 257310      656       29    40960        0             0 run-one
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004438] [ 257344]  3001 257344     1859       24    53248        0             0 flock
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004440] [ 257346]  3001 257346  7911996  7731555 62894080        0             0 python
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004442] [ 257347]  3001 257347     1859       24    61440        0             0 flock
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004443] [ 257348]  3001 257348     1859       24    49152        0             0 flock
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004445] [ 257349]  3001 257349   338380   158223  2191360        0             0 python
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004447] [ 257351]  3001 257351   365119   184973  2453504        0             0 python
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004449] [ 257355]  3001 257355     1859       25    53248        0             0 flock
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004451] [ 257356]  3001 257356  7883801  7703167 62664704        0             0 python
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004452] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=python,pid=257346,uid=3001
Mar  8 05:16:02 ip-10-110-10-133 kernel: [203986.004468] Out of memory: Killed process 257346 (python) total-vm:31647984kB, anon-rss:30926220kB, file-rss:0kB, shmem-rss:0kB, UID:3001 pgtables:61420kB oom_score_adj:0
Mar  8 05:17:01 ip-10-110-10-133 CRON[258623]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar  8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar  8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar  8 05:19:31 ip-10-110-10-133 postfix/smtp[258801]: D826A400AE: to=<[email protected]>, relay=email-smtp.us-east-1.amazonaws.com[52.20.44.183]:587, delay=0.39, delays=0.01/0.02/0.13/0.23, dsn=2.0.0, status=sent (250 Ok 01000186bfa906f8-4a6fed17-2ceb-423d-8476-e161c40962b0-000000)
Mar  8 05:23:03 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:23:03Z" level=warning msg="commands poll failed" component=CommandChannelService error="command request submission failed: Get \"https://infrastructure-command-api.newrelic.com/agent_commands/v1/commands\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Mar  8 05:27:24 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:27:24Z" level=warning msg="error occurred while updating the system fingerprint" component=Agent error="unable to fetch AWS metadata: Get \"http://169.254.169.254/latest/dynamic/instance-identity/document\": dial tcp 169.254.169.254:80: i/o timeout"
Mar  8 05:29:29 ip-10-110-10-133 newrelic-infra-service[97282]: time="2023-03-08T05:27:24Z" level=warning msg="commands poll failed" component=CommandChannelService error="command request submission failed: Get \"https://infrastructure-command-api.newrelic.com/agent_commands/v1/commands\": dial tcp 162.247.242.49:443: i/o timeout (Client.Timeout exceeded while awaiting headers)"
Mar  8 05:34:38 ip-10-110-10-133 systemd[1]: Starting Ubuntu Advantage Timer for running repeated jobs...
Mar  8 06:01:58 ip-10-110-10-133 systemd[1]: collector.service: Main process exited, code=exited, status=1/FAILURE
Mar  8 06:07:08 ip-10-110-10-133 systemd[1]: collector.service: Failed with result 'exit-code'.
Mar  8 06:09:58 ip-10-110-10-133 systemd-networkd[441]: ens5: Could not set DHCPv4 address: Connection timed out
Mar  8 06:12:04 ip-10-110-10-133 systemd-networkd[441]: ens5: Failed

OOM 发生后,很多服务开始出现故障,看起来不仅仅是 DNS 的问题。为什么会出现这种情况?被终止的进程难道不应该是唯一一个在 OOM 中受到影响的进程吗?

此外,令人费解的是,smtpd在事件发生后,以某种方式在转发电子邮件时会打印一条成功消息——我不确定这是否是转移注意力的手段,但所有其他服务在 OOM 之后都会报告网络错误。当然,重新启动可以解决所有问题。

答案1

我不相信 OOM 会终止一些 Python 进程造成IPv4 问题。尤其是一些可能不相关的问题,比如从 cron 启动的某些 Python 脚本。

另一种解释是,这些是不同故障的症状。为什么在 OOM 发生几分钟后,SMTP 消息仍然能够成功中继?

在这样的内存压力下,您的系统性能可能会非常糟糕。糟糕到 IP 连接失败并且服务无法有效工作,这种情况非常罕见,但确实存在。查看可用的任何性能指标。对于具有如此多 CPU 的机器,平均负载数字应该不会太疯狂。内存压力失速信息非常有价值,可以证明是否有任何时间花在内存上。正在运行的进程表明您正在使用 AWS CloudWwatch 或 New Relic,也请使用这些应用程序或主机指标。

另一方面,强制终止某些任务对系统稳定性和应用程序正确性不利。即使系统通常恢复,终止错误的应用程序或守护进程或代理的后果也可能很严重。Linux 虚拟内存系统会尽力避免 OOM 终止,这是最后的手段之一。

如果此盒子上的工作负载合理的话,看看容量规划,内存可能不够。

相关内容