一、日志:
[Wed aug 22 01:11:31 2018] java invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[Wed aug 22 01:11:31 2018] java cpuset=/ mems_allowed=0-1
[Wed aug 22 01:11:31 2018] CPU: 5 PID: 4726 Comm: java Tainted: G I 4.13.0-32-generic #35~16.04.1-Ubuntu
[Wed aug 22 01:11:31 2018] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/14/2010
[Wed aug 22 01:11:31 2018] Call Trace:
[Wed aug 22 01:11:31 2018] dump_stack+0x63/0x8b
[Wed aug 22 01:11:31 2018] dump_header+0x97/0x225
[Wed aug 22 01:11:31 2018] ? security_capable_noaudit+0x45/0x60
[Wed aug 22 01:11:31 2018] oom_kill_process+0x219/0x420
[Wed aug 22 01:11:31 2018] out_of_memory+0x11d/0x4b0
[Wed aug 22 01:11:31 2018] __alloc_pages_slowpath+0xd2e/0xe10
[Wed aug 22 01:11:31 2018] ? apic_timer_interrupt+0x101/0x1c0
[Wed aug 22 01:11:31 2018] ? apic_timer_interrupt+0xd7/0x1c0
[Wed aug 22 01:11:31 2018] __alloc_pages_nodemask+0x260/0x280
[Wed aug 22 01:11:31 2018] alloc_pages_vma+0x88/0x1e0
[Wed aug 22 01:11:31 2018] __handle_mm_fault+0xc7b/0x1070
[Wed aug 22 01:11:31 2018] handle_mm_fault+0xcc/0x1c0
[Wed aug 22 01:11:31 2018] __do_page_fault+0x258/0x4f0
[Wed aug 22 01:11:31 2018] do_page_fault+0x22/0x30
[Wed aug 22 01:11:31 2018] ? page_fault+0x36/0x60
[Wed aug 22 01:11:31 2018] page_fault+0x4c/0x60
[Wed aug 22 01:11:31 2018] RIP: 0033:0x7f2bc96dd62e
[Wed aug 22 01:11:31 2018] RSP: 002b:00007f2b4f8f7450 EFLAGS: 00010246
[Wed aug 22 01:11:31 2018] RAX: 0000000000000000 RBX: 00000006d7fe91d0 RCX: 00000000003e81aa
[Wed aug 22 01:11:31 2018] RDX: 0000000005bbe6da RSI: 00000006d7fe91e0 RDI: 00000006ecfa2000
[Wed aug 22 01:11:31 2018] RBP: 00000000daffd23a R08: 000000000b77cdc0 R09: 0000000005bbe6da
[Wed aug 22 01:11:31 2018] R10: 00000000016ef9b8 R11: 000000074e1015d8 R12: 0000000000000000
[Wed aug 22 01:11:31 2018] R13: 0000000005bbe6da R14: 0000000647817e70 R15: 00007f2bd8162800
[Wed aug 22 01:11:31 2018] Mem-Info:
[Wed aug 22 01:11:31 2018] active_anon:5272319 inactive_anon:753357 isolated_anon:0
active_file:201 inactive_file:275 isolated_file:0
unevictable:1202 dirty:17 writeback:0 unstable:0
slab_reclaimable:17005 slab_unreclaimable:39032
mapped:720524 shmem:783543 pagetables:15557 bounce:0
free:35825 free_pcp:11 free_cma:0
[Wed aug 22 01:11:31 2018] Node 0 active_anon:11964008kB inactive_anon:90592kB active_file:32kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:40904kB dirty:0kB writeback:0kB shmem:120956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 1 active_anon:9125268kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB isolated(anon):0kB isolated(file):0kB mapped:2841192kB dirty:68kB writeback:0kB shmem:3013216kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 0 DMA free:15892kB min:56kB low:68kB high:80kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 3494 12003 12003 12003
[Wed aug 22 01:11:31 2018] Node 0 DMA32 free:47848kB min:12928kB low:16460kB high:19992kB active_anon:3483584kB inactive_anon:12368kB active_file:16kB inactive_file:0kB unevictable:0kB writepending:0kB present:3643584kB managed:3578016kB mlocked:0kB kernel_stack:496kB pagetables:3816kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 8509 8509 8509
[Wed aug 22 01:11:31 2018] Node 0 Normal free:32604kB min:31892kB low:40604kB high:49316kB active_anon:8480424kB inactive_anon:78224kB active_file:16kB inactive_file:28kB unevictable:0kB writepending:0kB present:8912896kB managed:8713220kB mlocked:0kB kernel_stack:4168kB pagetables:12244kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 1 Normal free:46956kB min:45228kB low:57584kB high:69940kB active_anon:9124728kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB writepending:68kB present:12582908kB managed:12361464kB mlocked:4808kB kernel_stack:8456kB pagetables:46168kB bounce:0kB free_pcp:112kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[Wed aug 22 01:11:31 2018] Node 0 DMA32: 515*4kB (UMEH) 240*8kB (UMEH) 177*16kB (UMEH) 205*32kB (UMEH) 115*64kB (UMEH) 72*128kB (UEH) 34*256kB (UE) 16*512kB (UME) 1*1024kB (U) 0*2048kB 0*4096kB = 47868kB
[Wed aug 22 01:11:31 2018] Node 0 Normal: 1502*4kB (UMEH) 1712*8kB (UMEH) 359*16kB (UMEH) 122*32kB (UMEH) 36*64kB (UMEH) 7*128kB (M) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32808kB
[Wed aug 22 01:11:31 2018] Node 1 Normal: 649*4kB (UME) 654*8kB (UMEH) 1047*16kB (UMEH) 473*32kB (UMEH) 86*64kB (UMH) 2*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 46500kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] 785008 total pagecache pages
[Wed aug 22 01:11:31 2018] 0 pages in swap cache
[Wed aug 22 01:11:31 2018] Swap cache stats: add 0, delete 0, find 0/0
[Wed aug 22 01:11:31 2018] Free swap = 0kB
[Wed aug 22 01:11:31 2018] Total swap = 0kB
[Wed aug 22 01:11:31 2018] 6288845 pages RAM
[Wed aug 22 01:11:31 2018] 0 pages HighMem/MovableOnly
[Wed aug 22 01:11:31 2018] 121693 pages reserved
[Wed aug 22 01:11:31 2018] 0 pages cma reserved
[Wed aug 22 01:11:31 2018] 0 pages hwpoisoned
[Wed aug 22 01:11:31 2018] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Wed aug 22 01:11:31 2018] [ 407] 0 407 10891 3826 26 3 0 0 systemd-journal
[Wed aug 22 01:11:31 2018] [ 421] 0 421 25742 66 18 3 0 0 lvmetad
[Wed aug 22 01:11:31 2018] [ 872] 0 872 1099 20 8 3 0 0 acpid
[Wed aug 22 01:11:31 2018] [ 875] 103 875 10777 266 25 3 0 -900 dbus-daemon
[Wed aug 22 01:11:31 2018] [ 889] 0 889 6511 51 18 3 0 0 atd
[Wed aug 22 01:11:31 2018] [ 897] 101 897 65157 448 30 3 0 0 rsyslogd
[Wed aug 22 01:11:31 2018] [ 898] 0 898 7558 71 20 3 0 0 cron
[Wed aug 22 01:11:31 2018] [ 901] 0 901 7554 507 19 3 0 0 systemd-logind
[Wed aug 22 01:11:31 2018] [ 914] 0 914 69788 682 37 3 0 0 accounts-daemon
[Wed aug 22 01:11:31 2018] [ 943] 0 943 7469 52 20 3 0 0 cgmanager
[Wed aug 22 01:11:31 2018] [ 1094] 0 1094 4898 85 13 3 0 0 irqbalance
[Wed aug 22 01:11:31 2018] [ 1210] 0 1210 69277 202 40 4 0 0 polkitd
[Wed aug 22 01:11:31 2018] [ 1292] 0 1292 2631 29 9 3 0 0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1294] 0 1294 2665 35 10 3 0 0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1331] 0 1331 16377 180 35 3 0 -1000 sshd
[Wed aug 22 01:11:31 2018] [ 1389] 0 1389 4290 32 13 3 0 0 agetty
[Wed aug 22 01:11:31 2018] [ 1411] 0 1411 31618 1017 65 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1415] 0 1415 8167 50 20 3 0 0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1416] 0 1416 8167 55 21 3 0 0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1417] 111 1417 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1418] 111 1418 31656 1035 58 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1419] 111 1419 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1420] 111 1420 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1421] 111 1421 31651 1035 58 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 2211] 108 2211 12495 96 27 3 0 0 dnsmasq
[Wed aug 22 01:11:31 2018] [ 2212] 0 2212 12488 93 27 3 0 0 dnsmasq
[Wed aug 22 01:11:31 2018] [16086] 1004 16086 5161 1486 14 5 0 0 gnatsd
[Wed aug 22 01:11:31 2018] [26914] 111 26914 31656 1006 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [26036] 0 26036 11165 148 21 3 0 -1000 systemd-udevd
[Wed aug 22 01:11:31 2018] [29056] 100 29056 7345 48 19 3 0 0 uuidd
[Wed aug 22 01:11:31 2018] [29767] 0 29767 30229 203 45 3 0 -900 virtlogd
[Wed aug 22 01:11:31 2018] [29830] 0 29830 30230 204 43 3 0 -900 virtlockd
[Wed aug 22 01:11:31 2018] [11118] 1002 11118 11319 214 26 3 0 0 systemd
[Wed aug 22 01:11:31 2018] [11119] 1002 11119 15857 495 33 3 0 0 (sd-pam)
[Wed aug 22 01:11:31 2018] [26385] 999 26385 11497 282 19 3 0 0 redis-server
[Wed aug 22 01:11:31 2018] [ 751] 110 751 27508 167 25 3 0 0 ntpd
[Wed aug 22 01:11:31 2018] [ 8160] 0 8160 202275 2186 161 3 0 0 libvirtd
[Wed aug 22 01:11:31 2018] [13344] 105 13344 78205 3099 74 3 0 -900 postgres
[Wed aug 22 01:11:31 2018] [13417] 105 13417 1043259 675565 1390 7 0 -900 postgres
[Wed aug 22 01:11:31 2018] [13747] 105 13747 78230 889 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13749] 105 13749 78205 828 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13752] 105 13752 78205 1550 66 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13756] 105 13756 78308 754 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13757] 105 13757 41938 496 62 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13758] 105 13758 78280 660 67 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [17793] 105 17793 1043259 541 63 5 0 0 postgres
[Wed aug 22 01:11:31 2018] [17794] 105 17794 1043259 2549 68 6 0 0 postgres
[Wed aug 22 01:11:31 2018] [17796] 105 17796 1043259 505 62 5 0 0 postgres
[Wed aug 22 01:11:31 2018] [17800] 105 17800 1043388 736 70 6 0 0 postgres
[Wed aug 22 01:11:31 2018] [17801] 105 17801 40652 490 60 4 0 0 postgres
[Wed aug 22 01:11:31 2018] [ 6550] 0 6550 3764 49 13 3 0 0 xinetd
[Wed aug 22 01:11:31 2018] [ 6755] 0 6755 16352 117 24 3 0 0 master
[Wed aug 22 01:11:31 2018] [ 6757] 119 6757 16910 206 25 3 0 0 qmgr
[Wed aug 22 01:11:31 2018] [28294] 113 28294 23238 214 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28305] 113 28305 23238 457 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28306] 113 28306 23238 299 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28307] 113 28307 23238 235 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28308] 113 28308 23238 240 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28309] 113 28309 23238 221 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [17160] 1001 17160 6631678 3848634 7889 27 0 0 java
[Wed aug 22 01:11:31 2018] [19768] 0 19768 77523 39704 141 4 0 0 apache2
[Wed aug 22 01:11:31 2018] [21477] 33 21477 461614 2304 114 6 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 2617] 33 2617 346974 2057 107 5 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 4691] 1001 4691 4086425 1370850 2998 19 0 0 java
[Wed aug 22 01:11:31 2018] [ 5911] 0 5911 24241 716 52 3 0 0 sshd
[Wed aug 22 01:11:31 2018] [ 5914] 1000 5914 11319 547 27 3 0 0 systemd
[Wed aug 22 01:11:31 2018] [ 5915] 1000 5915 15981 647 33 3 0 0 (sd-pam)
[Wed aug 22 01:11:31 2018] [ 5974] 1000 5974 24241 596 51 3 0 0 sshd
[Wed aug 22 01:11:31 2018] [ 5975] 1000 5975 5973 940 17 3 0 0 bash
[Wed aug 22 01:11:31 2018] [ 5992] 0 5992 15653 582 36 3 0 0 sudo
[Wed aug 22 01:11:31 2018] [ 6041] 0 6041 5947 895 17 3 0 0 bash
[Wed aug 22 01:11:31 2018] [ 6104] 0 6104 15918 935 36 3 0 0 mc
[Wed aug 22 01:11:31 2018] [ 6106] 0 6106 5947 910 15 3 0 0 bash
[Wed aug 22 01:11:31 2018] [14474] 33 14474 281509 2009 102 5 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 9477] 0 9477 4274 1204 13 3 0 0 atop
[Wed aug 22 01:11:31 2018] [12755] 119 12755 16869 162 26 3 0 0 pickup
[Wed aug 22 01:11:31 2018] [15069] 113 15069 1126 145 7 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15070] 113 15070 5550 458 16 3 0 0 check.sh
[Wed aug 22 01:11:31 2018] [15073] 113 15073 3862 391 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15074] 113 15074 3868 89 13 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15075] 113 15075 1126 143 7 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15076] 113 15076 5550 444 16 3 0 0 check1.sh
[Wed aug 22 01:11:31 2018] [15079] 113 15079 3862 73 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15080] 113 15080 3868 72 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15081] 113 15081 1126 149 8 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15082] 113 15082 5550 456 17 3 0 0 check4.sh
[Wed aug 22 01:11:31 2018] [15085] 113 15085 3862 71 14 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15086] 113 15086 3868 51 13 3 0 0 grep
[Wed aug 22 01:11:31 2018] Out of memory: Kill process 17160 (java) score 625 or sacrifice child
[Wed aug 22 01:11:31 2018] Killed process 17160 (java) total-vm:26526712kB, anon-rss:15394536kB, file-rss:0kB, shmem-rss:0kB
我读过一些关于读取 OOM killer 日志的答案,但我的数字似乎不合逻辑。首先,一些答案是关于在 32 位系统上内存页面注册表空间不足。事实并非如此,我们有 64 位系统。
我不太明白的是:
- 测量单位。RSS 列应该在 中
pages
。但是如果我将其相加,我会得到13705822。乘以 4096 后将是 56139046912 (52Gb),但我们只有 24Gb 的 RAM 并且没有交换。如果改为在 中9963350。乘以 4096 后将是 40809881600 (38Gb),但我们只有 24Gb 的 RAM 并且没有交换。如果改为在 中kB
,则将是 14034761728 (13Gb)kB
,则将是 10202470400 (9,5Gb),并且有很多内存是空闲的?(抱歉,我算错了列,但问题是一样的) - 日志显示有
785008 total pagecache pages
,但出于某种原因系统没有驱逐它们?它不能?或者根本就没试过?这可能与已知的 postgres 共享缓冲区显示为页面缓存的问题有关 - 添加 [大概] 13Gb 的 RSS (kB) + 3Gb 的页面缓存可得到 16Gb,但这仍然比总共 24Gb 少得多。
Node 0 Normal free:32604kB
意味着Node 1 Normal free:46956kB
有一点内存可用。远多于 1 页,但只占总数的 0.3%,这可能低于某个阈值?还是应用程序一次请求更大的区域?- 内存消耗第二多的进程 4691 不再存在于系统中,但另一个具有较长运行时间的进程仍在系统中,但使用不同的 pid。尽管它的运行时间大于从 oom killer 日志到现在的差异,但不知何故它却没有出现在日志中。这个进程是否可能只是在运行时更改了 pid?
其他信息:操作系统 Ubuntu 16.04、24Gb RAM、无交换。