Java / JBoss 5.1.0 触发 OOM Killer

Java / JBoss 5.1.0 触发 OOM Killer

我正在运行 CentOS centos-release-6-0.el6.centos.5.x86_64 (2.6.32-71.29.1.el6.x86_64) 机箱,该机箱具有 32GB RAM 和 6 个 vCPU,是虚拟机。Java 在 Java(TM) SE Runtime Environment (build 1.6.0_27-b07) 版本中运行。

每隔一段时间,OOM Killer 就会杀死我的 JBoss,因为它的配置不允许使用超过 13GB 的 RAM。JBoss 参数如下

\_ java -Dprogram.name=run.sh -server -Xms12288m -Xmx12288m -XX:MaxPermSize=1024m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Djava.endorsed.dirs=/usr/java/jboss-as/lib/endorsed -classpath /usr/java/jboss-as/bin/run.jar org.jboss.Main -b 0.0.0.0 --configuration=default

发生这种情况时,将写入以下几行/var/log/messages

Feb 13 12:21:02 prod-app kernel: java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Feb 13 12:21:02 prod-app kernel: java cpuset=/ mems_allowed=0
Feb 13 12:21:02 prod-app kernel: Pid: 11903, comm: java Not tainted 2.6.32-71.29.1.el6.x86_64 #1
Feb 13 12:21:02 prod-app kernel: Call Trace:
Feb 13 12:21:02 prod-app kernel: [<ffffffff810c2e01>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110f1bb>] oom_kill_process+0xcb/0x2e0
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110f780>] ? select_bad_process+0xd0/0x110
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110f818>] __out_of_memory+0x58/0xc0
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110fa19>] out_of_memory+0x199/0x210
Feb 13 12:21:02 prod-app kernel: [<ffffffff8111ebe2>] __alloc_pages_nodemask+0x832/0x850
Feb 13 12:21:02 prod-app kernel: [<ffffffff81150cba>] alloc_pages_current+0x9a/0x100
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110c617>] __page_cache_alloc+0x87/0x90
Feb 13 12:21:02 prod-app kernel: [<ffffffff8112136b>] __do_page_cache_readahead+0xdb/0x210
Feb 13 12:21:02 prod-app kernel: [<ffffffff811214c1>] ra_submit+0x21/0x30
Feb 13 12:21:02 prod-app kernel: [<ffffffff8110e1c1>] filemap_fault+0x4b1/0x510
Feb 13 12:21:02 prod-app kernel: [<ffffffff81135604>] __do_fault+0x54/0x500
Feb 13 12:21:02 prod-app kernel: [<ffffffff81135ba7>] handle_pte_fault+0xf7/0xad0
Feb 13 12:21:02 prod-app kernel: [<ffffffff8125f78c>] ? rb_erase+0x1bc/0x310
Feb 13 12:21:02 prod-app kernel: [<ffffffff81056720>] ? __dequeue_entity+0x30/0x50
Feb 13 12:21:02 prod-app kernel: [<ffffffff810117bc>] ? __switch_to+0x1ac/0x320
Feb 13 12:21:02 prod-app kernel: [<ffffffff81059e02>] ? finish_task_switch+0x42/0xd0
Feb 13 12:21:02 prod-app kernel: [<ffffffff8113676d>] handle_mm_fault+0x1ed/0x2b0
Feb 13 12:21:02 prod-app kernel: [<ffffffff814c92b6>] ? thread_return+0x4e/0x778
Feb 13 12:21:02 prod-app kernel: [<ffffffff814ce503>] do_page_fault+0x123/0x3a0
Feb 13 12:21:02 prod-app kernel: [<ffffffff814cbf75>] page_fault+0x25/0x30
Feb 13 12:21:02 prod-app kernel: Mem-Info:
Feb 13 12:21:02 prod-app kernel: Node 0 DMA per-cpu:
Feb 13 12:21:02 prod-app kernel: CPU    0: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    1: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    2: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    3: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    4: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    5: hi:    0, btch:   1 usd:   0
Feb 13 12:21:02 prod-app kernel: Node 0 DMA32 per-cpu:
Feb 13 12:21:02 prod-app kernel: CPU    0: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    1: hi:  186, btch:  31 usd:  30
Feb 13 12:21:02 prod-app kernel: CPU    2: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    3: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    4: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    5: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: Node 0 Normal per-cpu:
Feb 13 12:21:02 prod-app kernel: CPU    0: hi:  186, btch:  31 usd:  10
Feb 13 12:21:02 prod-app kernel: CPU    1: hi:  186, btch:  31 usd:  30
Feb 13 12:21:02 prod-app kernel: CPU    2: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    3: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: CPU    4: hi:  186, btch:  31 usd:  36
Feb 13 12:21:02 prod-app kernel: CPU    5: hi:  186, btch:  31 usd:   0
Feb 13 12:21:02 prod-app kernel: active_anon:7449508 inactive_anon:565931 isolated_anon:0
Feb 13 12:21:02 prod-app kernel: active_file:0 inactive_file:665 isolated_file:0
Feb 13 12:21:02 prod-app kernel: unevictable:0 dirty:2 writeback:0 unstable:0
Feb 13 12:21:02 prod-app kernel: free:49966 slab_reclaimable:2775 slab_unreclaimable:143965
Feb 13 12:21:02 prod-app kernel: mapped:70 shmem:0 pagetables:21396 bounce:0
Feb 13 12:21:02 prod-app kernel: Node 0 DMA free:15584kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15188kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 13 12:21:02 prod-app kernel: lowmem_reserve[]: 0 3000 32290 32290
Feb 13 12:21:02 prod-app kernel: Node 0 DMA32 free:123316kB min:6276kB low:7844kB high:9412kB active_anon:1954808kB inactive_anon:508096kB active_file:0kB inactive_file:668kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:184kB slab_unreclaimable:112kB kernel_stack:0kB pagetables:3904kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:216 all_unreclaimable? yes
Feb 13 12:21:02 prod-app kernel: lowmem_reserve[]: 0 0 29290 29290
Feb 13 12:21:02 prod-app kernel: Node 0 Normal free:60964kB min:61276kB low:76592kB high:91912kB active_anon:27843224kB inactive_anon:1755628kB active_file:0kB inactive_file:1992kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29992960kB mlocked:0kB dirty:8kB writeback:0kB mapped:332kB shmem:0kB slab_reclaimable:10916kB slab_unreclaimable:575748kB kernel_stack:2968kB pagetables:81680kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? yes
Feb 13 12:21:02 prod-app kernel: lowmem_reserve[]: 0 0 0 0
Feb 13 12:21:02 prod-app kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15584kB
Feb 13 12:21:02 prod-app kernel: Node 0 DMA32: 95*4kB 74*8kB 39*16kB 25*32kB 8*64kB 8*128kB 3*256kB 4*512kB 2*1024kB 38*2048kB 9*4096kB = 123484kB
Feb 13 12:21:02 prod-app kernel: Node 0 Normal: 1234*4kB 798*8kB 561*16kB 411*32kB 203*64kB 93*128kB 9*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 60648kB
Feb 13 12:21:02 prod-app kernel: 10130 total pagecache pages
Feb 13 12:21:02 prod-app kernel: 9240 pages in swap cache
Feb 13 12:21:02 prod-app kernel: Swap cache stats: add 23810210, delete 23800970, find 5525856/6576436
Feb 13 12:21:02 prod-app kernel: Free swap  = 0kB
Feb 13 12:21:02 prod-app kernel: Total swap = 8388600kB
Feb 13 12:21:02 prod-app kernel: 8388592 pages RAM
Feb 13 12:21:02 prod-app kernel: 134650 pages reserved
Feb 13 12:21:02 prod-app kernel: 423 pages shared
Feb 13 12:21:02 prod-app kernel: 8069227 pages non-shared
Feb 13 12:21:02 prod-app kernel: Out of memory: kill process 11666 (run.sh) score 21512214 or a child
Feb 13 12:21:02 prod-app kernel: Killed process 11696 (java) vsz:50991828kB, anon-rss:32016636kB, file-rss:400kB

JBoss 似乎占用了比应有的更多的 RAM,这表明我们的应用程序可能存在内存泄漏。有趣的是:我们还有大约 20 个其他软件安装(也在 CentOS 机器上),这种行为是这台机器独有的。

我不确定如何调试它。JBoss 中没有记录任何信息server.log

  • 是否有机会通过调整 GC 参数来抢占 OOM Killer?
  • 当情况变得危急时,有没有什么工具可以帮助查看服务器上发生的事件?

非常感谢你的帮助!

祝好,
S。

答案1

这有点像是在黑暗中尝试,但是:您正在使用-Xms12288m -Xmx12288m它只会限制内存的堆部分。

由于您的进程内存使用量增长到约 50 GB,这可能是由巨大的堆栈引起的。因此,如果您的应用程序生成大量线程,严重依赖递归或类似的东西,这将是您的提示。

如果是这种情况,请查看 GC 日志、、、等内容jmapjstack分析pmap情况jvisualvm,并查看-Xss-XX:MaxThreadStackSizeJVM 参数和 ThreadPool 大小来解决问题。

(不太确定参数,Java 5 是我必须专业处理的最后一个版本。)

相关内容