我在系统日志中读到,当“正常”内存区域低于最小限制时,内存不足杀手将终止进程,但“HighMem”区域中仍然有大量可用内存。我很困惑为什么会发生这种情况,以及我是否可以做些什么来阻止它。我曾假设,如果一个区域中没有可用内存,内核将从不同的区域分配内存,除非进程需要来自特定区域的内存,但在本例中,导致 OOM 的应用程序是 python,我不能考虑一下为什么特别需要正常区域内存的原因。
这是最近出现的一个典型例子:
Feb 2 04:10:01 ldr kernel: python invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
Feb 2 04:10:01 ldr kernel: python cpuset=/ mems_allowed=0
Feb 2 04:10:01 ldr kernel: CPU: 0 PID: 593 Comm: python Tainted: G O 3.14.24-wt-ldr-TC #4
Feb 2 04:10:01 ldr kernel: Hardware name: Eurotech, Inc. Catalyst TC/Catalyst TC, BIOS 04.08.05.01 02/03/2017
Feb 2 04:10:01 ldr kernel: 00000000 00000000 f336fd7c c171e79e f4f03580 f336fdd8 c171b7a0 c1908d18
Feb 2 04:10:01 ldr kernel: f4f03954 002084d0 00000000 00000000 f336fdb8 c1106774 00000000 00000000
Feb 2 04:10:01 ldr kernel: f336fdb4 002084d0 00000000 f336fdd8 c13ac8d2 c1109634 f336fde8 f4f042e0
Feb 2 04:10:01 ldr kernel: Call Trace:
Feb 2 04:10:01 ldr kernel: [<c171e79e>] dump_stack+0x4b/0x75
Feb 2 04:10:01 ldr kernel: [<c171b7a0>] dump_header.isra.9+0x77/0x1ee
Feb 2 04:10:01 ldr kernel: [<c1106774>] ? shrink_slab+0xb4/0xf0
Feb 2 04:10:01 ldr kernel: [<c13ac8d2>] ? ___ratelimit+0x82/0x100
Feb 2 04:10:01 ldr kernel: [<c1109634>] ? do_try_to_free_pages+0x404/0x420
Feb 2 04:10:01 ldr kernel: [<c10f9fac>] oom_kill_process+0x1dc/0x360
Feb 2 04:10:01 ldr kernel: [<c10486d6>] ? has_ns_capability_noaudit+0x36/0x50
Feb 2 04:10:01 ldr kernel: [<c1048704>] ? has_capability_noaudit+0x14/0x20
Feb 2 04:10:01 ldr kernel: [<c10f9c87>] ? oom_badness+0xa7/0x100
Feb 2 04:10:01 ldr kernel: [<c10f9d29>] ? oom_scan_process_thread+0x49/0xc0
Feb 2 04:10:01 ldr kernel: [<c10fa4d4>] out_of_memory+0x1f4/0x2d0
Feb 2 04:10:01 ldr kernel: [<c10fe557>] __alloc_pages_nodemask+0x937/0x950
Feb 2 04:10:01 ldr kernel: [<c10fe58d>] __get_free_pages+0x1d/0x30
Feb 2 04:10:01 ldr kernel: [<c103b3be>] pgd_alloc+0x1e/0x130
Feb 2 04:10:01 ldr kernel: [<c103dfc0>] mm_init+0xc0/0xf0
Feb 2 04:10:01 ldr kernel: [<c103e256>] mm_alloc+0x56/0xa0
Feb 2 04:10:01 ldr kernel: [<c1143ddf>] do_execve+0x19f/0x5a0
Feb 2 04:10:01 ldr kernel: [<c1144389>] SyS_execve+0x29/0x40
Feb 2 04:10:01 ldr kernel: [<c172a6be>] sysenter_do_call+0x12/0x12
Feb 2 04:10:01 ldr kernel: Mem-Info:
Feb 2 04:10:01 ldr kernel: DMA per-cpu:
Feb 2 04:10:01 ldr kernel: CPU 0: hi: 0, btch: 1 usd: 0
Feb 2 04:10:01 ldr kernel: CPU 1: hi: 0, btch: 1 usd: 0
Feb 2 04:10:01 ldr kernel: Normal per-cpu:
Feb 2 04:10:01 ldr kernel: CPU 0: hi: 186, btch: 31 usd: 179
Feb 2 04:10:01 ldr kernel: CPU 1: hi: 186, btch: 31 usd: 130
Feb 2 04:10:01 ldr kernel: HighMem per-cpu:
Feb 2 04:10:01 ldr kernel: CPU 0: hi: 186, btch: 31 usd: 6
Feb 2 04:10:01 ldr kernel: CPU 1: hi: 186, btch: 31 usd: 51
Feb 2 04:10:01 ldr kernel: active_anon:3208 inactive_anon:67 isolated_anon:0
Feb 2 04:10:01 ldr kernel: active_file:1330 inactive_file:3589 isolated_file:0
Feb 2 04:10:01 ldr kernel: unevictable:0 dirty:0 writeback:0 unstable:0
Feb 2 04:10:01 ldr kernel: free:290674 slab_reclaimable:1335 slab_unreclaimable:3757
Feb 2 04:10:01 ldr kernel: mapped:1448 shmem:251 pagetables:105 bounce:0
Feb 2 04:10:01 ldr kernel: free_cma:0
Feb 2 04:10:01 ldr kernel: DMA free:3388kB min:64kB low:80kB high:96kB active_anon:80kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15916kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:128kB kernel_stack:16kB pagetables:4kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 2 04:10:01 ldr kernel: lowmem_reserve[]: 0 839 1996 1996
Feb 2 04:10:01 ldr kernel: Normal free:3556kB min:3672kB low:4588kB high:5508kB active_anon:3820kB inactive_anon:56kB active_file:168kB inactive_file:284kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:860136kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:72kB slab_reclaimable:5304kB slab_unreclaimable:14900kB kernel_stack:744kB pagetables:204kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:749 all_unreclaimable? yes
Feb 2 04:10:01 ldr kernel: lowmem_reserve[]: 0 0 9257 9257
Feb 2 04:10:01 ldr kernel: HighMem free:1155752kB min:512kB low:1776kB high:3040kB active_anon:8932kB inactive_anon:212kB active_file:5152kB inactive_file:14072kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1184968kB managed:1184968kB mlocked:0kB dirty:0kB writeback:0kB mapped:5792kB shmem:932kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:212kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 2 04:10:01 ldr kernel: lowmem_reserve[]: 0 0 0 0
Feb 2 04:10:01 ldr kernel: DMA: 21*4kB (UE) 15*8kB (UEM) 9*16kB (UM) 9*32kB (UM) 5*64kB (UMR) 3*128kB (MR) 2*256kB (ER) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 3388kB
Feb 2 04:10:01 ldr kernel: Normal: 333*4kB (UEM) 269*8kB (M) 0*16kB 1*32kB (R) 1*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3580kB
Feb 2 04:10:01 ldr kernel: HighMem: 1085*4kB (UM) 1427*8kB (UM) 1298*16kB (UM) 1116*32kB (UM) 933*64kB (UM) 723*128kB (UM) 506*256kB (UM) 280*512kB (UM) 103*1024kB (UM) 28*2048kB (UM) 121*4096kB (MR) = 1155820kB
Feb 2 04:10:01 ldr kernel: 5177 total pagecache pages
Feb 2 04:10:01 ldr kernel: 0 pages in swap cache
Feb 2 04:10:01 ldr kernel: Swap cache stats: add 0, delete 0, find 0/0
Feb 2 04:10:01 ldr kernel: Free swap = 0kB
Feb 2 04:10:01 ldr kernel: Total swap = 0kB
Feb 2 04:10:01 ldr kernel: 523470 pages RAM
Feb 2 04:10:01 ldr kernel: 296242 pages HighMem/MovableOnly
Feb 2 04:10:01 ldr kernel: 0 pages reserved
Feb 2 04:10:01 ldr kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Feb 2 04:10:01 ldr kernel: [ 301] 0 301 807 232 3 0 0 upstart-udev-br
Feb 2 04:10:01 ldr kernel: [ 303] 0 303 768 256 3 0 -1000 udevd
Feb 2 04:10:01 ldr kernel: [ 372] 0 372 741 157 2 0 -1000 udevd
Feb 2 04:10:01 ldr kernel: [ 373] 0 373 741 152 2 0 -1000 udevd
Feb 2 04:10:01 ldr kernel: [ 795] 0 795 1172 202 3 0 0 vsftpd
Feb 2 04:10:01 ldr kernel: [ 849] 0 849 674 176 3 0 0 rpcbind
Feb 2 04:10:01 ldr kernel: [ 910] 0 910 711 66 2 0 0 upstart-socket-
Feb 2 04:10:01 ldr kernel: [ 1004] 0 1004 1670 398 3 0 -1000 sshd
Feb 2 04:10:01 ldr kernel: [ 1010] 0 1010 727 52 3 0 0 rpc.idmapd
Feb 2 04:10:01 ldr kernel: [ 1016] 102 1016 814 181 2 0 0 dbus-daemon
Feb 2 04:10:01 ldr kernel: [ 1058] 101 1058 8070 659 8 0 0 rsyslogd
Feb 2 04:10:01 ldr kernel: [ 1081] 107 1081 739 240 3 0 0 rpc.statd
Feb 2 04:10:01 ldr kernel: [ 1125] 0 1125 1038 151 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1132] 0 1132 1038 154 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1145] 0 1145 1038 152 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1146] 0 1146 1038 150 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1151] 0 1151 1038 150 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1160] 0 1160 640 151 2 0 0 xinetd
Feb 2 04:10:01 ldr kernel: [ 1166] 0 1166 545 102 3 0 0 acpid
Feb 2 04:10:01 ldr kernel: [ 1167] 0 1167 654 174 3 0 0 cron
Feb 2 04:10:01 ldr kernel: [ 1168] 0 1168 617 69 3 0 0 atd
Feb 2 04:10:01 ldr kernel: [ 1179] 0 1179 900 134 3 0 0 irqbalance
Feb 2 04:10:01 ldr kernel: [ 1189] 103 1189 6116 669 7 0 0 whoopsie
Feb 2 04:10:01 ldr kernel: [ 1235] 0 1235 843 176 3 0 0 rpc.mountd
Feb 2 04:10:01 ldr kernel: [ 1386] 0 1386 4937 1061 6 0 0 python
Feb 2 04:10:01 ldr kernel: [ 1387] 0 1387 535 79 3 0 0 watchdog
Feb 2 04:10:01 ldr kernel: [ 1395] 0 1395 600 139 3 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 1396] 0 1396 1038 151 4 0 0 getty
Feb 2 04:10:01 ldr kernel: [ 592] 0 592 823 262 3 0 0 ldrc_script.s
Feb 2 04:10:01 ldr kernel: [ 593] 0 593 4680 710 5 0 0 python
Feb 2 04:10:01 ldr kernel: [ 594] 0 594 653 59 3 0 0 cron
Feb 2 04:10:01 ldr kernel: Out of memory: Kill process 1386 (python) score 2 or sacrifice child
Feb 2 04:10:01 ldr kernel: Killed process 593 (python) total-vm:18720kB, anon-rss:2184kB, file-rss:656kB
我假设触发 oom-killer 是因为“正常”区域已降至 3672kB 的最小值以下,如下行所示:
Normal free:3556kB min:3672kB low:4588kB high:5508kB active_anon:3820kB inactive_anon:56kB active_file:168kB inactive_file:284kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:860136kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:72kB slab_reclaimable:5304kB slab_unreclaimable:14900kB kernel_stack:744kB pagetables:204kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:749 all_unreclaimable? yes
但“HighMem”区域还有足够的空间:
HighMem free:1155752kB min:512kB low:1776kB high:3040kB active_anon:8932kB inactive_anon:212kB active_file:5152kB inactive_file:14072kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1184968kB managed:1184968kB mlocked:0kB dirty:0kB writeback:0kB mapped:5792kB shmem:932kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:212kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
那么,为什么内核会继续在 Normal 中分配内存,直到它低于最小值,然后必须开始终止进程,而它本来可以使用 HighMem 中的一些可用空间呢?内核版本是3.14。