我们运行的是 64 位 Ubuntu,32GB 的物理内存被分为 3 个区域(DMA:16MB、DMA32:4GB 和 Normal:30GB)。根据下面打印的 dmesg 日志,我们的系统在 Normal 区域用完了可用内存。日志显示可用内存低于下限,因此开始大量交换。
[Fri Feb 8 15:42:54 2019] Node 0 Normal free:63692kB min:61368kB low:76708kB high:92052kB active_anon:13819020kB inactive_anon:1390260kB active_file:324kB inactive_file:736kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29882588kB mlocked:0kB dirty:4kB writeback:0kB mapped:4708kB shmem:4860kB slab_reclaimable:75272kB slab_unreclaimable:35816kB kernel_stack:7312kB pagetables:35924kB unstable:0kB bounce:0kB free_pcp:652kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:908 all_unreclaimable? no
不幸的是我们的 2GB 交换空间也满了:
Fri Feb 8 15:42:54 2019] Free swap = 0kB
Fri Feb 8 15:42:54 2019] Total swap = 2097148kB
我们可以增加更多交换空间或减少系统内存过度使用。但在这样做之前,我们想了解一件事:我们计算了内核在 dmesg 日志中报告的所有 rss 页面。总和结果为 4083166 个页面,大约为 16GB 内存。我们预计会保留更多内存(32GB)。
谁使用了剩余的内存以及我们如何追踪这一点?
以下是 oom-killer 报告的完整 dmesg 日志:
[Fri Feb 8 15:42:54 2019] [main]-pipeline invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[Fri Feb 8 15:42:54 2019] [main]-pipeline cpuset=3a8d1e7785d259036358f790d6bcd25682f2296f7f5aae4007122a62345b283d mems_allowed=0
[Fri Feb 8 15:42:54 2019] CPU: 4 PID: 13956 Comm: [main]-pipeline Tainted: G L 4.4.0-87-generic #110-Ubuntu
[Fri Feb 8 15:42:54 2019] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015
[Fri Feb 8 15:42:54 2019] 0000000000000286 d31b6617973a4be6 ffff88028d4bf9f8 ffffffff813f9903
[Fri Feb 8 15:42:54 2019] ffff88028d4bfbb0 ffff880819b24600 ffff88028d4bfa68 ffffffff8120b75e
[Fri Feb 8 15:42:54 2019] ffffffff8113f98a ffff88028d4bfa98 ffffffff811a722d ffff88081bf15460
[Fri Feb 8 15:42:54 2019] Call Trace:
[Fri Feb 8 15:42:54 2019] [<ffffffff813f9903>] dump_stack+0x63/0x90
[Fri Feb 8 15:42:54 2019] [<ffffffff8120b75e>] dump_header+0x5a/0x1c5
[Fri Feb 8 15:42:54 2019] [<ffffffff8113f98a>] ? __delayacct_freepages_end+0x2a/0x30
[Fri Feb 8 15:42:54 2019] [<ffffffff811a722d>] ? do_try_to_free_pages+0x2ed/0x410
[Fri Feb 8 15:42:54 2019] [<ffffffff81192ce2>] oom_kill_process+0x202/0x3c0
[Fri Feb 8 15:42:54 2019] [<ffffffff81193109>] out_of_memory+0x219/0x460
[Fri Feb 8 15:42:54 2019] [<ffffffff811990f8>] __alloc_pages_slowpath.constprop.88+0x938/0xad0
[Fri Feb 8 15:42:54 2019] [<ffffffff81199516>] __alloc_pages_nodemask+0x286/0x2a0
[Fri Feb 8 15:42:54 2019] [<ffffffff811e305c>] alloc_pages_current+0x8c/0x110
[Fri Feb 8 15:42:54 2019] [<ffffffff8118f2ab>] __page_cache_alloc+0xab/0xc0
[Fri Feb 8 15:42:54 2019] [<ffffffff811917ba>] filemap_fault+0x14a/0x3f0
[Fri Feb 8 15:42:54 2019] [<ffffffff812a3736>] ext4_filemap_fault+0x36/0x50
[Fri Feb 8 15:42:54 2019] [<ffffffff811be7d0>] __do_fault+0x50/0xe0
[Fri Feb 8 15:42:54 2019] [<ffffffff811c22f2>] handle_mm_fault+0xfa2/0x1820
[Fri Feb 8 15:42:54 2019] [<ffffffff8106b577>] __do_page_fault+0x197/0x400
[Fri Feb 8 15:42:54 2019] [<ffffffff8106b802>] do_page_fault+0x22/0x30
[Fri Feb 8 15:42:54 2019] [<ffffffff81844038>] page_fault+0x28/0x30
[Fri Feb 8 15:42:54 2019] Mem-Info:
[Fri Feb 8 15:42:54 2019] active_anon:3623939 inactive_anon:519706 isolated_anon:0
active_file:98 inactive_file:206 isolated_file:0
unevictable:0 dirty:1 writeback:0 unstable:0
slab_reclaimable:20794 slab_unreclaimable:9765
mapped:1178 shmem:3190 pagetables:10160 bounce:0
free:50583 free_pcp:183 free_cma:0
[Fri Feb 8 15:42:54 2019] Node 0 DMA free:15860kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Fri Feb 8 15:42:54 2019] lowmem_reserve[]: 0 2940 32122 32122 32122
[Fri Feb 8 15:42:54 2019] Node 0 DMA32 free:122780kB min:6180kB low:7724kB high:9268kB active_anon:676736kB inactive_anon:688564kB active_file:68kB inactive_file:88kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:3048596kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:7900kB slab_reclaimable:7904kB slab_unreclaimable:3228kB kernel_stack:688kB pagetables:4716kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2140 all_unreclaimable? yes
[Fri Feb 8 15:42:54 2019] lowmem_reserve[]: 0 0 29182 29182 29182
[Fri Feb 8 15:42:54 2019] Node 0 Normal free:63692kB min:61368kB low:76708kB high:92052kB active_anon:13819020kB inactive_anon:1390260kB active_file:324kB inactive_file:736kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29882588kB mlocked:0kB dirty:4kB writeback:0kB mapped:4708kB shmem:4860kB slab_reclaimable:75272kB slab_unreclaimable:35816kB kernel_stack:7312kB pagetables:35924kB unstable:0kB bounce:0kB free_pcp:652kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:908 all_unreclaimable? no
[Fri Feb 8 15:42:54 2019] lowmem_reserve[]: 0 0 0 0 0
[Fri Feb 8 15:42:54 2019] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15860kB
[Fri Feb 8 15:42:54 2019] Node 0 DMA32: 6560*4kB (UME) 1162*8kB (UME) 832*16kB (ME) 1143*32kB (UME) 587*64kB (UME) 4*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123504kB
[Fri Feb 8 15:42:54 2019] Node 0 Normal: 14276*4kB (UME) 609*8kB (MH) 3*16kB (H) 1*32kB (H) 0*64kB 0*128kB 0*256kB 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 63592kB
[Fri Feb 8 15:42:54 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Fri Feb 8 15:42:54 2019] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Fri Feb 8 15:42:54 2019] 58196 total pagecache pages
[Fri Feb 8 15:42:54 2019] 54643 pages in swap cache
[Fri Feb 8 15:42:54 2019] Swap cache stats: add 18366740, delete 18312097, find 6820279/8173012
[Fri Feb 8 15:42:54 2019] Free swap = 0kB
[Fri Feb 8 15:42:54 2019] Total swap = 2097148kB
[Fri Feb 8 15:42:54 2019] 8388494 pages RAM
[Fri Feb 8 15:42:54 2019] 0 pages HighMem/MovableOnly
[Fri Feb 8 15:42:54 2019] 151721 pages reserved
[Fri Feb 8 15:42:54 2019] 0 pages cma reserved
[Fri Feb 8 15:42:54 2019] 0 pages hwpoisoned
[Fri Feb 8 15:42:54 2019] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Fri Feb 8 15:42:54 2019] [ 306] 0 306 10968 1627 25 4 533 0 systemd-journal
[Fri Feb 8 15:42:54 2019] [ 344] 0 344 25742 36 17 4 534 0 lvmetad
[Fri Feb 8 15:42:54 2019] [ 722] 0 722 7318 30 19 3 31 0 cron
[Fri Feb 8 15:42:54 2019] [ 727] 106 727 10752 129 25 4 39 -900 dbus-daemon
[Fri Feb 8 15:42:54 2019] [ 780] 0 780 69900 465 38 3 70 0 accounts-daemon
[Fri Feb 8 15:42:54 2019] [ 782] 0 782 7163 42 19 3 46 0 systemd-logind
[Fri Feb 8 15:42:54 2019] [ 787] 0 787 76201 555 104 3 28939 0 vmtoolsd
[Fri Feb 8 15:42:54 2019] [ 843] 0 843 4901 51 14 3 35 0 irqbalance
[Fri Feb 8 15:42:54 2019] [ 856] 0 856 273330 4976 117 6 4094 -500 dockerd
[Fri Feb 8 15:42:54 2019] [ 1106] 0 1106 4058 22 13 3 14 0 agetty
[Fri Feb 8 15:42:54 2019] [19618] 108 19618 27567 124 24 4 93 0 ntpd
[Fri Feb 8 15:42:54 2019] [22824] 104 22824 83051 279 30 4 93 0 rsyslogd
[Fri Feb 8 15:42:54 2019] [ 6763] 0 6763 11007 108 21 3 0 -1000 systemd-udevd
[Fri Feb 8 15:42:54 2019] [ 4730] 0 4730 214969 2443 50 6 196 -500 docker-containe
[Fri Feb 8 15:42:54 2019] [13392] 0 13392 5105 14 14 3 29 0 daemon
[Fri Feb 8 15:42:54 2019] [13393] 0 13393 183580 1886 43 5 1251 0 prometheus-node
[Fri Feb 8 15:42:54 2019] [13786] 0 13786 103424 0 26 5 187 -500 docker-containe
[Fri Feb 8 15:42:54 2019] [13805] 1000 13805 276 0 4 2 13 0 tini
[Fri Feb 8 15:42:54 2019] [13871] 1000 13871 1586068 133739 811 10 220384 0 java
[Fri Feb 8 15:42:54 2019] [ 4506] 0 4506 16377 18 36 3 160 -1000 sshd
[Fri Feb 8 15:42:54 2019] [25171] 0 25171 45485 0 18 5 144 -500 docker-proxy
[Fri Feb 8 15:42:54 2019] [25183] 0 25183 31150 0 18 5 659 -500 docker-proxy
[Fri Feb 8 15:42:54 2019] [25190] 0 25190 87040 713 25 5 82 -500 docker-containe
[Fri Feb 8 15:42:54 2019] [25208] 2000 25208 4915 1 14 3 81 0 artifactory_sta
[Fri Feb 8 15:42:54 2019] [25291] 2000 25291 4948 1 15 3 113 0 artifactory.sh
[Fri Feb 8 15:42:54 2019] [25320] 2000 25320 7879838 3931096 8163 26 184030 0 java
[Fri Feb 8 15:42:54 2019] [27475] 0 27475 23199 68 49 3 166 0 sshd
[Fri Feb 8 15:42:54 2019] [27478] 1013 27478 11312 0 27 3 212 0 systemd
[Fri Feb 8 15:42:54 2019] [27483] 1013 27483 15322 105 33 3 382 0 (sd-pam)
[Fri Feb 8 15:42:54 2019] [27502] 1013 27502 23199 51 47 3 181 0 sshd
[Fri Feb 8 15:42:54 2019] [27503] 1013 27503 6055 316 17 3 492 0 bash
[Fri Feb 8 15:42:54 2019] [29901] 0 29901 23199 233 50 3 0 0 sshd
[Fri Feb 8 15:42:54 2019] [29903] 1006 29903 11312 173 26 3 0 0 systemd
[Fri Feb 8 15:42:54 2019] [29905] 1006 29905 15322 209 33 3 278 0 (sd-pam)
[Fri Feb 8 15:42:54 2019] [29925] 1006 29925 23199 231 47 3 0 0 sshd
[Fri Feb 8 15:42:54 2019] [29926] 1006 29926 6571 811 19 3 0 0 bash
[Fri Feb 8 15:42:54 2019] [30400] 1006 30400 152391 1805 46 5 0 0 docker
[Fri Feb 8 15:42:54 2019] [30411] 0 30411 101375 670 25 6 511 -500 docker-containe
[Fri Feb 8 15:42:54 2019] [30428] 0 30428 4967 139 14 3 0 0 bash
[Fri Feb 8 15:42:54 2019] Out of memory: Kill process 25320 (java) score 470 or sacrifice child
[Fri Feb 8 15:42:54 2019] Killed process 25320 (java) total-vm:31519352kB, anon-rss:15724384kB, file-rss:0kB