Linux 内存不足终止程序会在内存可用量过大时终止进程

Linux 内存不足终止程序会在内存可用量过大时终止进程

我们运行的是 64 位 Ubuntu,32GB 的物理内存被分为 3 个区域(DMA:16MB、DMA32:4GB 和 Normal:30GB)。根据下面打印的 dmesg 日志,我们的系统在 Normal 区域用完了可用内存。日志显示可用内存低于下限,因此开始大量交换。

[Fri Feb  8 15:42:54 2019] Node 0 Normal free:63692kB min:61368kB low:76708kB high:92052kB active_anon:13819020kB inactive_anon:1390260kB active_file:324kB inactive_file:736kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29882588kB mlocked:0kB dirty:4kB writeback:0kB mapped:4708kB shmem:4860kB slab_reclaimable:75272kB slab_unreclaimable:35816kB kernel_stack:7312kB pagetables:35924kB unstable:0kB bounce:0kB free_pcp:652kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:908 all_unreclaimable? no

不幸的是我们的 2GB 交换空间也满了:

Fri Feb  8 15:42:54 2019] Free swap  = 0kB
Fri Feb  8 15:42:54 2019] Total swap = 2097148kB

我们可以增加更多交换空间或减少系统内存过度使用。但在这样做之前,我们想了解一件事:我们计算了内核在 dmesg 日志中报告的所有 rss 页面。总和结果为 4083166 个页面,大约为 16GB 内存。我们预计会保留更多内存(32GB)。

谁使用了剩余的内存以及我们如何追踪这一点?

以下是 oom-killer 报告的完整 dmesg 日志:

[Fri Feb  8 15:42:54 2019] [main]-pipeline invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[Fri Feb  8 15:42:54 2019] [main]-pipeline cpuset=3a8d1e7785d259036358f790d6bcd25682f2296f7f5aae4007122a62345b283d mems_allowed=0
[Fri Feb  8 15:42:54 2019] CPU: 4 PID: 13956 Comm: [main]-pipeline Tainted: G             L  4.4.0-87-generic #110-Ubuntu
[Fri Feb  8 15:42:54 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
[Fri Feb  8 15:42:54 2019]  0000000000000286 d31b6617973a4be6 ffff88028d4bf9f8 ffffffff813f9903
[Fri Feb  8 15:42:54 2019]  ffff88028d4bfbb0 ffff880819b24600 ffff88028d4bfa68 ffffffff8120b75e
[Fri Feb  8 15:42:54 2019]  ffffffff8113f98a ffff88028d4bfa98 ffffffff811a722d ffff88081bf15460
[Fri Feb  8 15:42:54 2019] Call Trace:
[Fri Feb  8 15:42:54 2019]  [<ffffffff813f9903>] dump_stack+0x63/0x90
[Fri Feb  8 15:42:54 2019]  [<ffffffff8120b75e>] dump_header+0x5a/0x1c5
[Fri Feb  8 15:42:54 2019]  [<ffffffff8113f98a>] ? __delayacct_freepages_end+0x2a/0x30
[Fri Feb  8 15:42:54 2019]  [<ffffffff811a722d>] ? do_try_to_free_pages+0x2ed/0x410
[Fri Feb  8 15:42:54 2019]  [<ffffffff81192ce2>] oom_kill_process+0x202/0x3c0
[Fri Feb  8 15:42:54 2019]  [<ffffffff81193109>] out_of_memory+0x219/0x460
[Fri Feb  8 15:42:54 2019]  [<ffffffff811990f8>] __alloc_pages_slowpath.constprop.88+0x938/0xad0
[Fri Feb  8 15:42:54 2019]  [<ffffffff81199516>] __alloc_pages_nodemask+0x286/0x2a0
[Fri Feb  8 15:42:54 2019]  [<ffffffff811e305c>] alloc_pages_current+0x8c/0x110
[Fri Feb  8 15:42:54 2019]  [<ffffffff8118f2ab>] __page_cache_alloc+0xab/0xc0
[Fri Feb  8 15:42:54 2019]  [<ffffffff811917ba>] filemap_fault+0x14a/0x3f0
[Fri Feb  8 15:42:54 2019]  [<ffffffff812a3736>] ext4_filemap_fault+0x36/0x50
[Fri Feb  8 15:42:54 2019]  [<ffffffff811be7d0>] __do_fault+0x50/0xe0
[Fri Feb  8 15:42:54 2019]  [<ffffffff811c22f2>] handle_mm_fault+0xfa2/0x1820
[Fri Feb  8 15:42:54 2019]  [<ffffffff8106b577>] __do_page_fault+0x197/0x400
[Fri Feb  8 15:42:54 2019]  [<ffffffff8106b802>] do_page_fault+0x22/0x30
[Fri Feb  8 15:42:54 2019]  [<ffffffff81844038>] page_fault+0x28/0x30
[Fri Feb  8 15:42:54 2019] Mem-Info:
[Fri Feb  8 15:42:54 2019] active_anon:3623939 inactive_anon:519706 isolated_anon:0
                            active_file:98 inactive_file:206 isolated_file:0
                            unevictable:0 dirty:1 writeback:0 unstable:0
                            slab_reclaimable:20794 slab_unreclaimable:9765
                            mapped:1178 shmem:3190 pagetables:10160 bounce:0
                            free:50583 free_pcp:183 free_cma:0
[Fri Feb  8 15:42:54 2019] Node 0 DMA free:15860kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Fri Feb  8 15:42:54 2019] lowmem_reserve[]: 0 2940 32122 32122 32122
[Fri Feb  8 15:42:54 2019] Node 0 DMA32 free:122780kB min:6180kB low:7724kB high:9268kB active_anon:676736kB inactive_anon:688564kB active_file:68kB inactive_file:88kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:3048596kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:7900kB slab_reclaimable:7904kB slab_unreclaimable:3228kB kernel_stack:688kB pagetables:4716kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2140 all_unreclaimable? yes
[Fri Feb  8 15:42:54 2019] lowmem_reserve[]: 0 0 29182 29182 29182
[Fri Feb  8 15:42:54 2019] Node 0 Normal free:63692kB min:61368kB low:76708kB high:92052kB active_anon:13819020kB inactive_anon:1390260kB active_file:324kB inactive_file:736kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29882588kB mlocked:0kB dirty:4kB writeback:0kB mapped:4708kB shmem:4860kB slab_reclaimable:75272kB slab_unreclaimable:35816kB kernel_stack:7312kB pagetables:35924kB unstable:0kB bounce:0kB free_pcp:652kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:908 all_unreclaimable? no
[Fri Feb  8 15:42:54 2019] lowmem_reserve[]: 0 0 0 0 0
[Fri Feb  8 15:42:54 2019] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15860kB
[Fri Feb  8 15:42:54 2019] Node 0 DMA32: 6560*4kB (UME) 1162*8kB (UME) 832*16kB (ME) 1143*32kB (UME) 587*64kB (UME) 4*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123504kB
[Fri Feb  8 15:42:54 2019] Node 0 Normal: 14276*4kB (UME) 609*8kB (MH) 3*16kB (H) 1*32kB (H) 0*64kB 0*128kB 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 63592kB
[Fri Feb  8 15:42:54 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Fri Feb  8 15:42:54 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Fri Feb  8 15:42:54 2019] 58196 total pagecache pages
[Fri Feb  8 15:42:54 2019] 54643 pages in swap cache
[Fri Feb  8 15:42:54 2019] Swap cache stats: add 18366740, delete 18312097, find 6820279/8173012
[Fri Feb  8 15:42:54 2019] Free swap  = 0kB
[Fri Feb  8 15:42:54 2019] Total swap = 2097148kB
[Fri Feb  8 15:42:54 2019] 8388494 pages RAM
[Fri Feb  8 15:42:54 2019] 0 pages HighMem/MovableOnly
[Fri Feb  8 15:42:54 2019] 151721 pages reserved
[Fri Feb  8 15:42:54 2019] 0 pages cma reserved
[Fri Feb  8 15:42:54 2019] 0 pages hwpoisoned
[Fri Feb  8 15:42:54 2019] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Fri Feb  8 15:42:54 2019] [  306]     0   306    10968     1627      25       4      533             0 systemd-journal
[Fri Feb  8 15:42:54 2019] [  344]     0   344    25742       36      17       4      534             0 lvmetad
[Fri Feb  8 15:42:54 2019] [  722]     0   722     7318       30      19       3       31             0 cron
[Fri Feb  8 15:42:54 2019] [  727]   106   727    10752      129      25       4       39          -900 dbus-daemon
[Fri Feb  8 15:42:54 2019] [  780]     0   780    69900      465      38       3       70             0 accounts-daemon
[Fri Feb  8 15:42:54 2019] [  782]     0   782     7163       42      19       3       46             0 systemd-logind
[Fri Feb  8 15:42:54 2019] [  787]     0   787    76201      555     104       3    28939             0 vmtoolsd
[Fri Feb  8 15:42:54 2019] [  843]     0   843     4901       51      14       3       35             0 irqbalance
[Fri Feb  8 15:42:54 2019] [  856]     0   856   273330     4976     117       6     4094          -500 dockerd
[Fri Feb  8 15:42:54 2019] [ 1106]     0  1106     4058       22      13       3       14             0 agetty
[Fri Feb  8 15:42:54 2019] [19618]   108 19618    27567      124      24       4       93             0 ntpd
[Fri Feb  8 15:42:54 2019] [22824]   104 22824    83051      279      30       4       93             0 rsyslogd
[Fri Feb  8 15:42:54 2019] [ 6763]     0  6763    11007      108      21       3        0         -1000 systemd-udevd
[Fri Feb  8 15:42:54 2019] [ 4730]     0  4730   214969     2443      50       6      196          -500 docker-containe
[Fri Feb  8 15:42:54 2019] [13392]     0 13392     5105       14      14       3       29             0 daemon
[Fri Feb  8 15:42:54 2019] [13393]     0 13393   183580     1886      43       5     1251             0 prometheus-node
[Fri Feb  8 15:42:54 2019] [13786]     0 13786   103424        0      26       5      187          -500 docker-containe
[Fri Feb  8 15:42:54 2019] [13805]  1000 13805      276        0       4       2       13             0 tini
[Fri Feb  8 15:42:54 2019] [13871]  1000 13871  1586068   133739     811      10   220384             0 java
[Fri Feb  8 15:42:54 2019] [ 4506]     0  4506    16377       18      36       3      160         -1000 sshd
[Fri Feb  8 15:42:54 2019] [25171]     0 25171    45485        0      18       5      144          -500 docker-proxy
[Fri Feb  8 15:42:54 2019] [25183]     0 25183    31150        0      18       5      659          -500 docker-proxy
[Fri Feb  8 15:42:54 2019] [25190]     0 25190    87040      713      25       5       82          -500 docker-containe
[Fri Feb  8 15:42:54 2019] [25208]  2000 25208     4915        1      14       3       81             0 artifactory_sta
[Fri Feb  8 15:42:54 2019] [25291]  2000 25291     4948        1      15       3      113             0 artifactory.sh
[Fri Feb  8 15:42:54 2019] [25320]  2000 25320  7879838  3931096    8163      26   184030             0 java
[Fri Feb  8 15:42:54 2019] [27475]     0 27475    23199       68      49       3      166             0 sshd
[Fri Feb  8 15:42:54 2019] [27478]  1013 27478    11312        0      27       3      212             0 systemd
[Fri Feb  8 15:42:54 2019] [27483]  1013 27483    15322      105      33       3      382             0 (sd-pam)
[Fri Feb  8 15:42:54 2019] [27502]  1013 27502    23199       51      47       3      181             0 sshd
[Fri Feb  8 15:42:54 2019] [27503]  1013 27503     6055      316      17       3      492             0 bash
[Fri Feb  8 15:42:54 2019] [29901]     0 29901    23199      233      50       3        0             0 sshd
[Fri Feb  8 15:42:54 2019] [29903]  1006 29903    11312      173      26       3        0             0 systemd
[Fri Feb  8 15:42:54 2019] [29905]  1006 29905    15322      209      33       3      278             0 (sd-pam)
[Fri Feb  8 15:42:54 2019] [29925]  1006 29925    23199      231      47       3        0             0 sshd
[Fri Feb  8 15:42:54 2019] [29926]  1006 29926     6571      811      19       3        0             0 bash
[Fri Feb  8 15:42:54 2019] [30400]  1006 30400   152391     1805      46       5        0             0 docker
[Fri Feb  8 15:42:54 2019] [30411]     0 30411   101375      670      25       6      511          -500 docker-containe
[Fri Feb  8 15:42:54 2019] [30428]     0 30428     4967      139      14       3        0             0 bash
[Fri Feb  8 15:42:54 2019] Out of memory: Kill process 25320 (java) score 470 or sacrifice child
[Fri Feb  8 15:42:54 2019] Killed process 25320 (java) total-vm:31519352kB, anon-rss:15724384kB, file-rss:0kB

相关内容