Java 进程经常被 OOM 杀死 - 升级硬件上的杀手

Java 进程经常被 OOM 杀死 - 升级硬件上的杀手

我正在 Ubuntu 服务器上运行 solr 以及其他 4 个 java 进程。当前索引大小为 30 GB。我的 solr 进程在几个小时内经常被杀死。它清楚地提到它是 OOM 杀手。我无法理解到底是什么导致了问题。它显示可用交换内存为零。我是否需要增加交换内存或禁用它。可能的解决方案是什么。我在 4GB VPS 上运行相同的进程和 solr,并且没有引起问题。切换到专用服务器开始引起问题。因此,我认为,这与配置有关。避免 OOM-killer 的解决方案应该是什么?查看内核日志后,我发现了以下日志。

Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686839] java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686842] java cpuset=/ mems_allowed=0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686845] CPU: 3 PID: 7207 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686847] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 1.1a 09/28/2011
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686850]  0000000000000000 ffff880019b519b8 ffffffff81715ac4 ffff8800149bc7d0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686857]  ffff880019b51a40 ffffffff817103ff 0000000000000000 ffffffff81c3f820
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686862]  ffff880019b51a70 0000000000000015 0000000000000000 0000000000000000
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686867] Call Trace:
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686877]  [<ffffffff81715ac4>] dump_stack+0x45/0x56
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686882]  [<ffffffff817103ff>] dump_header+0x7f/0x1f1
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686886]  [<ffffffff8115197e>] oom_kill_process+0x1ce/0x330
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686890]  [<ffffffff812d0135>] ? security_capable_noaudit+0x15/0x20
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686892]  [<ffffffff811520b4>] out_of_memory+0x414/0x450
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686894]  [<ffffffff81158223>] __alloc_pages_nodemask+0x983/0xa20
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686897]  [<ffffffff811976ba>] alloc_pages_vma+0x9a/0x140
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686900]  [<ffffffff8118aaeb>] read_swap_cache_async+0xeb/0x160
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686902]  [<ffffffff8118abf8>] swapin_readahead+0x98/0xe0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686906]  [<ffffffff81178c6e>] handle_mm_fault+0xa7e/0xf10
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686910]  [<ffffffff81721a24>] __do_page_fault+0x184/0x560
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686916]  [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686920]  [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686923]  [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686925]  [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686929]  [<ffffffff8171e288>] page_fault+0x28/0x30
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686932] Mem-Info:
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686935] Node 0 DMA per-cpu:
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686938] CPU    0: hi:    0, btch:   1 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686940] CPU    1: hi:    0, btch:   1 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686942] CPU    2: hi:    0, btch:   1 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686943] CPU    3: hi:    0, btch:   1 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686945] Node 0 DMA32 per-cpu:
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686948] CPU    0: hi:  186, btch:  31 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686949] CPU    1: hi:  186, btch:  31 usd:   2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686950] CPU    2: hi:  186, btch:  31 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686951] CPU    3: hi:  186, btch:  31 usd:   0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686952] Node 0 Normal per-cpu:
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686953] CPU    0: hi:  186, btch:  31 usd:  29
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686954] CPU    1: hi:  186, btch:  31 usd:  24
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686955] CPU    2: hi:  186, btch:  31 usd:  32
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686956] CPU    3: hi:  186, btch:  31 usd:  16
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959] active_anon:607357 inactive_anon:174797 isolated_anon:32
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959]  active_file:53 inactive_file:142 isolated_file:0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959]  unevictable:0 dirty:0 writeback:12 unstable:0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959]  free:25861 slab_reclaimable:4362 slab_unreclaimable:10351
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959]  mapped:4655 shmem:4670 pagetables:12562 bounce:0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686959]  free_cma:0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686961] Node 0 DMA free:15900kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686965] lowmem_reserve[]: 0 2954 7945 7945
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686969] Node 0 DMA32 free:45092kB min:25080kB low:31348kB high:37620kB active_anon:2306664kB inactive_anon:576776kB active_file:44kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3107092kB managed:3028172kB mlocked:0kB dirty:0kB writeback:0kB mapped:18416kB shmem:18508kB slab_reclaimable:6800kB slab_unreclaimable:12740kB kernel_stack:2040kB pagetables:26380kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:77 all_unreclaimable? yes
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686973] lowmem_reserve[]: 0 0 4990 4990
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686979] Node 0 Normal free:42452kB min:42368kB low:52960kB high:63552kB active_anon:122764kB inactive_anon:122412kB active_file:168kB inactive_file:600kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:5242880kB managed:5110564kB mlocked:0kB dirty:0kB writeback:48kB mapped:204kB shmem:172kB slab_reclaimable:10648kB slab_unreclaimable:28664kB kernel_stack:3336kB pagetables:23868kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1649 all_unreclaimable? yes
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686984] lowmem_reserve[]: 0 0 0 0
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.686986] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687005] Node 0 DMA32: 724*4kB (UEMR) 958*8kB (UEM) 1360*16kB (UEM) 395*32kB (UEM) 5*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45280kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687015] Node 0 Normal: 572*4kB (UEMR) 853*8kB (UEM) 589*16kB (UEM) 276*32kB (UEM) 133*64kB (UEM) 27*128kB (UEM) 12*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 42408kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687023] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687024] 55331 total pagecache pages
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687025] 50275 pages in swap cache
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687027] Swap cache stats: add 13552969455, delete 13552919180, find 5504664044/6715155814
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687027] Free swap  = 0kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687029] Total swap = 3905532kB
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687030] 2091489 pages RAM
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687032] 0 pages HighMem/MovableOnly
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687033] 33079 pages reserved
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687034] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687043] [  303]     0   303     4902        0      13       99             0 upstart-udev-br
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687047] [  308]     0   308    12804        1      28      145         -1000 systemd-udevd
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687049] [  515]     0   515     3815        0      12       75             0 upstart-socket-
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687051] [  582]     0   582     5883        0      15      100             0 vsftpd
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687054] [  717]   102   717     9807        0      25      100             0 dbus-daemon
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687056] [  801]     0   801    10863        1      27       89             0 systemd-logind
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687058] [  872]   101   872    64154       98      42     7756             0 rsyslogd
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687060] [  921]     0   921     3852        0      12       93             0 upstart-file-br
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687062] [ 1020]     0  1020     3955        1      13       40             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687065] [ 1023]     0  1023     3955        1      15       38             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687067] [ 1028]     0  1028     3955        1      12       39             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687069] [ 1029]     0  1029     3955        1      13       38             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687071] [ 1032]     0  1032     3955        1      13       38             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687073] [ 1051]     0  1051     1092        0       8       37             0 acpid
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687075] [ 1053]     0  1053     4785        0      13       46             0 atd
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687076] [ 1054]     0  1054     5914       17      16       51             0 cron
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687078] [ 1066]     0  1066    15341        0      33      182         -1000 sshd
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687079] [ 1075]     0  1075     4797       28      14       30             0 irqbalance
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687080] [ 1118]   106  1118   598658    16362     248    39647             0 mysqld
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687082] [ 1338]     0  1338     3955        1      12       41             0 getty
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687083] [25936]   108 25936  1018711    10052     508   173458             0 java
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687085] [26010]     0 26010    96095       56     120     1914             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687087] [25528]     0 25528     6833       93      17      111             0 screen
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687088] [25529]     0 25529     5316        0      15      184             0 bash
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687089] [ 2954]     0  2954  9691875   196411    7258   375033             0 java
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687091] [24527]     0 24527    14910        0      34      114             0 cron
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687093] [24529]     0 24529     1111        0       7       26             0 sh
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687096] [24532]     0 24532     1795        0       9       23             0 flock
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687098] [24534]     0 24534  1194110   175735     661    91027             0 java
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687100] [ 6883]     0  6883    14910        0      34      114             0 cron
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687102] [ 6890]     0  6890     1111        0       7       26             0 sh
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687103] [ 6891]     0  6891     1795        0       9       24             0 flock
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687105] [ 6896]     0  6896  1160035   117317     590   132623             0 java
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687108] [ 7096]    33  7096    97330     4899     133     2340             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687111] [ 7195]     0  7195    14910        2      34      111             0 cron
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687113] [ 7197]     0  7197     1111        0       7       24             0 sh
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687115] [ 7201]     0  7201     1795        0       9       23             0 flock
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687117] [ 7203]     0  7203  1125170   194186     747   136458             0 java
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687119] [ 7267]    33  7267    97545     4189     128     2166             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687121] [ 7272]    33  7272    97552     4653     128     1790             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687123] [ 7285]    33  7285    97134     4994     126     1263             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687125] [ 7298]    33  7298    97573     6297     135     1326             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687127] [ 7306]    33  7306    97775     5594     137     1981             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687129] [ 7330]    33  7330    97550     4065     127     2434             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687131] [ 7334]    33  7334    97350     4508     133     2480             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687133] [ 7593]    33  7593    96230      212     113     1846             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687135] [ 7599]    33  7599    97445     3916     127     2509             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687137] [ 7607]    33  7607    97091     2245     125     1924             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687139] [ 7640]    33  7640    96129      185     112     1814             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687141] [ 7642]    33  7642    97318     3877     128     2279             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687143] [ 7645]    33  7645    97385     4407     127     1808             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687145] [ 7651]    33  7651    96121      159     112     1831             0 apache2
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.687147] Out of memory: Kill process 2954 (java) score 186 or sacrifice child
Feb  2 18:14:08 xxxxxxxxx kernel: [4247473.770503] Killed process 2954 (java) total-vm:38767500kB, anon-rss:785644kB, file-rss:0kB

我的硬件是 120 GB SSD 和 8 GB RAM。

相关内容