我正在 Ubuntu 服务器上运行 solr 以及其他 4 个 java 进程。当前索引大小为 30 GB。我的 solr 进程在几个小时内经常被杀死。它清楚地提到它是 OOM 杀手。我无法理解到底是什么导致了问题。它显示可用交换内存为零。我是否需要增加交换内存或禁用它。可能的解决方案是什么。我在 4GB VPS 上运行相同的进程和 solr,并且没有引起问题。切换到专用服务器开始引起问题。因此,我认为,这与配置有关。避免 OOM-killer 的解决方案应该是什么?查看内核日志后,我发现了以下日志。
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686839] java invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686842] java cpuset=/ mems_allowed=0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686845] CPU: 3 PID: 7207 Comm: java Not tainted 3.13.0-24-generic #47-Ubuntu
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686847] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 1.1a 09/28/2011
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686850] 0000000000000000 ffff880019b519b8 ffffffff81715ac4 ffff8800149bc7d0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686857] ffff880019b51a40 ffffffff817103ff 0000000000000000 ffffffff81c3f820
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686862] ffff880019b51a70 0000000000000015 0000000000000000 0000000000000000
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686867] Call Trace:
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686877] [<ffffffff81715ac4>] dump_stack+0x45/0x56
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686882] [<ffffffff817103ff>] dump_header+0x7f/0x1f1
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686886] [<ffffffff8115197e>] oom_kill_process+0x1ce/0x330
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686890] [<ffffffff812d0135>] ? security_capable_noaudit+0x15/0x20
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686892] [<ffffffff811520b4>] out_of_memory+0x414/0x450
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686894] [<ffffffff81158223>] __alloc_pages_nodemask+0x983/0xa20
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686897] [<ffffffff811976ba>] alloc_pages_vma+0x9a/0x140
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686900] [<ffffffff8118aaeb>] read_swap_cache_async+0xeb/0x160
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686902] [<ffffffff8118abf8>] swapin_readahead+0x98/0xe0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686906] [<ffffffff81178c6e>] handle_mm_fault+0xa7e/0xf10
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686910] [<ffffffff81721a24>] __do_page_fault+0x184/0x560
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686916] [<ffffffff811112fc>] ? acct_account_cputime+0x1c/0x20
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686920] [<ffffffff8109d76b>] ? account_user_time+0x8b/0xa0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686923] [<ffffffff8109dd84>] ? vtime_account_user+0x54/0x60
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686925] [<ffffffff81721e1a>] do_page_fault+0x1a/0x70
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686929] [<ffffffff8171e288>] page_fault+0x28/0x30
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686932] Mem-Info:
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686935] Node 0 DMA per-cpu:
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686938] CPU 0: hi: 0, btch: 1 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686940] CPU 1: hi: 0, btch: 1 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686942] CPU 2: hi: 0, btch: 1 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686943] CPU 3: hi: 0, btch: 1 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686945] Node 0 DMA32 per-cpu:
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686948] CPU 0: hi: 186, btch: 31 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686949] CPU 1: hi: 186, btch: 31 usd: 2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686950] CPU 2: hi: 186, btch: 31 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686951] CPU 3: hi: 186, btch: 31 usd: 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686952] Node 0 Normal per-cpu:
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686953] CPU 0: hi: 186, btch: 31 usd: 29
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686954] CPU 1: hi: 186, btch: 31 usd: 24
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686955] CPU 2: hi: 186, btch: 31 usd: 32
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686956] CPU 3: hi: 186, btch: 31 usd: 16
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] active_anon:607357 inactive_anon:174797 isolated_anon:32
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] active_file:53 inactive_file:142 isolated_file:0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] unevictable:0 dirty:0 writeback:12 unstable:0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] free:25861 slab_reclaimable:4362 slab_unreclaimable:10351
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] mapped:4655 shmem:4670 pagetables:12562 bounce:0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686959] free_cma:0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686961] Node 0 DMA free:15900kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686965] lowmem_reserve[]: 0 2954 7945 7945
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686969] Node 0 DMA32 free:45092kB min:25080kB low:31348kB high:37620kB active_anon:2306664kB inactive_anon:576776kB active_file:44kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3107092kB managed:3028172kB mlocked:0kB dirty:0kB writeback:0kB mapped:18416kB shmem:18508kB slab_reclaimable:6800kB slab_unreclaimable:12740kB kernel_stack:2040kB pagetables:26380kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:77 all_unreclaimable? yes
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686973] lowmem_reserve[]: 0 0 4990 4990
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686979] Node 0 Normal free:42452kB min:42368kB low:52960kB high:63552kB active_anon:122764kB inactive_anon:122412kB active_file:168kB inactive_file:600kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:5242880kB managed:5110564kB mlocked:0kB dirty:0kB writeback:48kB mapped:204kB shmem:172kB slab_reclaimable:10648kB slab_unreclaimable:28664kB kernel_stack:3336kB pagetables:23868kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1649 all_unreclaimable? yes
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686984] lowmem_reserve[]: 0 0 0 0
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.686986] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687005] Node 0 DMA32: 724*4kB (UEMR) 958*8kB (UEM) 1360*16kB (UEM) 395*32kB (UEM) 5*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 45280kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687015] Node 0 Normal: 572*4kB (UEMR) 853*8kB (UEM) 589*16kB (UEM) 276*32kB (UEM) 133*64kB (UEM) 27*128kB (UEM) 12*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 42408kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687023] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687024] 55331 total pagecache pages
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687025] 50275 pages in swap cache
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687027] Swap cache stats: add 13552969455, delete 13552919180, find 5504664044/6715155814
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687027] Free swap = 0kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687029] Total swap = 3905532kB
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687030] 2091489 pages RAM
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687032] 0 pages HighMem/MovableOnly
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687033] 33079 pages reserved
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687034] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687043] [ 303] 0 303 4902 0 13 99 0 upstart-udev-br
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687047] [ 308] 0 308 12804 1 28 145 -1000 systemd-udevd
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687049] [ 515] 0 515 3815 0 12 75 0 upstart-socket-
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687051] [ 582] 0 582 5883 0 15 100 0 vsftpd
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687054] [ 717] 102 717 9807 0 25 100 0 dbus-daemon
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687056] [ 801] 0 801 10863 1 27 89 0 systemd-logind
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687058] [ 872] 101 872 64154 98 42 7756 0 rsyslogd
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687060] [ 921] 0 921 3852 0 12 93 0 upstart-file-br
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687062] [ 1020] 0 1020 3955 1 13 40 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687065] [ 1023] 0 1023 3955 1 15 38 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687067] [ 1028] 0 1028 3955 1 12 39 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687069] [ 1029] 0 1029 3955 1 13 38 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687071] [ 1032] 0 1032 3955 1 13 38 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687073] [ 1051] 0 1051 1092 0 8 37 0 acpid
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687075] [ 1053] 0 1053 4785 0 13 46 0 atd
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687076] [ 1054] 0 1054 5914 17 16 51 0 cron
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687078] [ 1066] 0 1066 15341 0 33 182 -1000 sshd
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687079] [ 1075] 0 1075 4797 28 14 30 0 irqbalance
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687080] [ 1118] 106 1118 598658 16362 248 39647 0 mysqld
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687082] [ 1338] 0 1338 3955 1 12 41 0 getty
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687083] [25936] 108 25936 1018711 10052 508 173458 0 java
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687085] [26010] 0 26010 96095 56 120 1914 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687087] [25528] 0 25528 6833 93 17 111 0 screen
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687088] [25529] 0 25529 5316 0 15 184 0 bash
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687089] [ 2954] 0 2954 9691875 196411 7258 375033 0 java
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687091] [24527] 0 24527 14910 0 34 114 0 cron
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687093] [24529] 0 24529 1111 0 7 26 0 sh
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687096] [24532] 0 24532 1795 0 9 23 0 flock
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687098] [24534] 0 24534 1194110 175735 661 91027 0 java
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687100] [ 6883] 0 6883 14910 0 34 114 0 cron
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687102] [ 6890] 0 6890 1111 0 7 26 0 sh
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687103] [ 6891] 0 6891 1795 0 9 24 0 flock
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687105] [ 6896] 0 6896 1160035 117317 590 132623 0 java
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687108] [ 7096] 33 7096 97330 4899 133 2340 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687111] [ 7195] 0 7195 14910 2 34 111 0 cron
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687113] [ 7197] 0 7197 1111 0 7 24 0 sh
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687115] [ 7201] 0 7201 1795 0 9 23 0 flock
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687117] [ 7203] 0 7203 1125170 194186 747 136458 0 java
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687119] [ 7267] 33 7267 97545 4189 128 2166 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687121] [ 7272] 33 7272 97552 4653 128 1790 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687123] [ 7285] 33 7285 97134 4994 126 1263 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687125] [ 7298] 33 7298 97573 6297 135 1326 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687127] [ 7306] 33 7306 97775 5594 137 1981 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687129] [ 7330] 33 7330 97550 4065 127 2434 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687131] [ 7334] 33 7334 97350 4508 133 2480 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687133] [ 7593] 33 7593 96230 212 113 1846 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687135] [ 7599] 33 7599 97445 3916 127 2509 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687137] [ 7607] 33 7607 97091 2245 125 1924 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687139] [ 7640] 33 7640 96129 185 112 1814 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687141] [ 7642] 33 7642 97318 3877 128 2279 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687143] [ 7645] 33 7645 97385 4407 127 1808 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687145] [ 7651] 33 7651 96121 159 112 1831 0 apache2
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.687147] Out of memory: Kill process 2954 (java) score 186 or sacrifice child
Feb 2 18:14:08 xxxxxxxxx kernel: [4247473.770503] Killed process 2954 (java) total-vm:38767500kB, anon-rss:785644kB, file-rss:0kB
我的硬件是 120 GB SSD 和 8 GB RAM。