为什么 oom-killer 会利用其他区域中的空闲内存运行?

为什么 oom-killer 会利用其他区域中的空闲内存运行?

我在我的系统日志中看到,当“正常”内存区域低于最低限制时,内存不足杀手会终止进程,但“HighMem”区域中仍有大量可用内存。我很困惑为什么会发生这种情况,以及我是否可以采取任何措施来阻止它。我曾假设,如果一个区域中没有可用内存,则内核会从另一个区域中分配内存,除非进程需要特定区域中的内存,但在这个例子中,导致 OOM 的应用程序是 python,我想不出为什么它需要正常区域中的内存。

以下是最近出现的一个典型例子:

Feb  2 04:10:01 ldr kernel: python invoked oom-killer: gfp_mask=0x2084d0, order=0, oom_score_adj=0
Feb  2 04:10:01 ldr kernel: python cpuset=/ mems_allowed=0
Feb  2 04:10:01 ldr kernel: CPU: 0 PID: 593 Comm: python Tainted: G           O 3.14.24-wt-ldr-TC #4
Feb  2 04:10:01 ldr kernel: Hardware name: Eurotech, Inc. Catalyst TC/Catalyst TC, BIOS 04.08.05.01 02/03/2017
Feb  2 04:10:01 ldr kernel: 00000000 00000000 f336fd7c c171e79e f4f03580 f336fdd8 c171b7a0 c1908d18
Feb  2 04:10:01 ldr kernel: f4f03954 002084d0 00000000 00000000 f336fdb8 c1106774 00000000 00000000
Feb  2 04:10:01 ldr kernel: f336fdb4 002084d0 00000000 f336fdd8 c13ac8d2 c1109634 f336fde8 f4f042e0
Feb  2 04:10:01 ldr kernel: Call Trace:
Feb  2 04:10:01 ldr kernel: [<c171e79e>] dump_stack+0x4b/0x75
Feb  2 04:10:01 ldr kernel: [<c171b7a0>] dump_header.isra.9+0x77/0x1ee
Feb  2 04:10:01 ldr kernel: [<c1106774>] ? shrink_slab+0xb4/0xf0
Feb  2 04:10:01 ldr kernel: [<c13ac8d2>] ? ___ratelimit+0x82/0x100
Feb  2 04:10:01 ldr kernel: [<c1109634>] ? do_try_to_free_pages+0x404/0x420
Feb  2 04:10:01 ldr kernel: [<c10f9fac>] oom_kill_process+0x1dc/0x360
Feb  2 04:10:01 ldr kernel: [<c10486d6>] ? has_ns_capability_noaudit+0x36/0x50
Feb  2 04:10:01 ldr kernel: [<c1048704>] ? has_capability_noaudit+0x14/0x20
Feb  2 04:10:01 ldr kernel: [<c10f9c87>] ? oom_badness+0xa7/0x100
Feb  2 04:10:01 ldr kernel: [<c10f9d29>] ? oom_scan_process_thread+0x49/0xc0
Feb  2 04:10:01 ldr kernel: [<c10fa4d4>] out_of_memory+0x1f4/0x2d0
Feb  2 04:10:01 ldr kernel: [<c10fe557>] __alloc_pages_nodemask+0x937/0x950
Feb  2 04:10:01 ldr kernel: [<c10fe58d>] __get_free_pages+0x1d/0x30
Feb  2 04:10:01 ldr kernel: [<c103b3be>] pgd_alloc+0x1e/0x130
Feb  2 04:10:01 ldr kernel: [<c103dfc0>] mm_init+0xc0/0xf0
Feb  2 04:10:01 ldr kernel: [<c103e256>] mm_alloc+0x56/0xa0
Feb  2 04:10:01 ldr kernel: [<c1143ddf>] do_execve+0x19f/0x5a0
Feb  2 04:10:01 ldr kernel: [<c1144389>] SyS_execve+0x29/0x40
Feb  2 04:10:01 ldr kernel: [<c172a6be>] sysenter_do_call+0x12/0x12
Feb  2 04:10:01 ldr kernel: Mem-Info:
Feb  2 04:10:01 ldr kernel: DMA per-cpu:
Feb  2 04:10:01 ldr kernel: CPU    0: hi:    0, btch:   1 usd:   0
Feb  2 04:10:01 ldr kernel: CPU    1: hi:    0, btch:   1 usd:   0
Feb  2 04:10:01 ldr kernel: Normal per-cpu:
Feb  2 04:10:01 ldr kernel: CPU    0: hi:  186, btch:  31 usd: 179
Feb  2 04:10:01 ldr kernel: CPU    1: hi:  186, btch:  31 usd: 130
Feb  2 04:10:01 ldr kernel: HighMem per-cpu:
Feb  2 04:10:01 ldr kernel: CPU    0: hi:  186, btch:  31 usd:   6
Feb  2 04:10:01 ldr kernel: CPU    1: hi:  186, btch:  31 usd:  51
Feb  2 04:10:01 ldr kernel: active_anon:3208 inactive_anon:67 isolated_anon:0
Feb  2 04:10:01 ldr kernel: active_file:1330 inactive_file:3589 isolated_file:0
Feb  2 04:10:01 ldr kernel: unevictable:0 dirty:0 writeback:0 unstable:0
Feb  2 04:10:01 ldr kernel: free:290674 slab_reclaimable:1335 slab_unreclaimable:3757
Feb  2 04:10:01 ldr kernel: mapped:1448 shmem:251 pagetables:105 bounce:0
Feb  2 04:10:01 ldr kernel: free_cma:0
Feb  2 04:10:01 ldr kernel: DMA free:3388kB min:64kB low:80kB high:96kB active_anon:80kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15916kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:36kB slab_unreclaimable:128kB kernel_stack:16kB pagetables:4kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb  2 04:10:01 ldr kernel: lowmem_reserve[]: 0 839 1996 1996
Feb  2 04:10:01 ldr kernel: Normal free:3556kB min:3672kB low:4588kB high:5508kB active_anon:3820kB inactive_anon:56kB active_file:168kB inactive_file:284kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:860136kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:72kB slab_reclaimable:5304kB slab_unreclaimable:14900kB kernel_stack:744kB pagetables:204kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:749 all_unreclaimable? yes
Feb  2 04:10:01 ldr kernel: lowmem_reserve[]: 0 0 9257 9257
Feb  2 04:10:01 ldr kernel: HighMem free:1155752kB min:512kB low:1776kB high:3040kB active_anon:8932kB inactive_anon:212kB active_file:5152kB inactive_file:14072kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1184968kB managed:1184968kB mlocked:0kB dirty:0kB writeback:0kB mapped:5792kB shmem:932kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:212kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb  2 04:10:01 ldr kernel: lowmem_reserve[]: 0 0 0 0
Feb  2 04:10:01 ldr kernel: DMA: 21*4kB (UE) 15*8kB (UEM) 9*16kB (UM) 9*32kB (UM) 5*64kB (UMR) 3*128kB (MR) 2*256kB (ER) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 3388kB
Feb  2 04:10:01 ldr kernel: Normal: 333*4kB (UEM) 269*8kB (M) 0*16kB 1*32kB (R) 1*64kB (R) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3580kB
Feb  2 04:10:01 ldr kernel: HighMem: 1085*4kB (UM) 1427*8kB (UM) 1298*16kB (UM) 1116*32kB (UM) 933*64kB (UM) 723*128kB (UM) 506*256kB (UM) 280*512kB (UM) 103*1024kB (UM) 28*2048kB (UM) 121*4096kB (MR) = 1155820kB
Feb  2 04:10:01 ldr kernel: 5177 total pagecache pages
Feb  2 04:10:01 ldr kernel: 0 pages in swap cache
Feb  2 04:10:01 ldr kernel: Swap cache stats: add 0, delete 0, find 0/0
Feb  2 04:10:01 ldr kernel: Free swap  = 0kB
Feb  2 04:10:01 ldr kernel: Total swap = 0kB
Feb  2 04:10:01 ldr kernel: 523470 pages RAM
Feb  2 04:10:01 ldr kernel: 296242 pages HighMem/MovableOnly
Feb  2 04:10:01 ldr kernel: 0 pages reserved
Feb  2 04:10:01 ldr kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Feb  2 04:10:01 ldr kernel: [  301]     0   301      807      232       3        0             0 upstart-udev-br
Feb  2 04:10:01 ldr kernel: [  303]     0   303      768      256       3        0         -1000 udevd
Feb  2 04:10:01 ldr kernel: [  372]     0   372      741      157       2        0         -1000 udevd
Feb  2 04:10:01 ldr kernel: [  373]     0   373      741      152       2        0         -1000 udevd
Feb  2 04:10:01 ldr kernel: [  795]     0   795     1172      202       3        0             0 vsftpd
Feb  2 04:10:01 ldr kernel: [  849]     0   849      674      176       3        0             0 rpcbind
Feb  2 04:10:01 ldr kernel: [  910]     0   910      711       66       2        0             0 upstart-socket-
Feb  2 04:10:01 ldr kernel: [ 1004]     0  1004     1670      398       3        0         -1000 sshd
Feb  2 04:10:01 ldr kernel: [ 1010]     0  1010      727       52       3        0             0 rpc.idmapd
Feb  2 04:10:01 ldr kernel: [ 1016]   102  1016      814      181       2        0             0 dbus-daemon
Feb  2 04:10:01 ldr kernel: [ 1058]   101  1058     8070      659       8        0             0 rsyslogd
Feb  2 04:10:01 ldr kernel: [ 1081]   107  1081      739      240       3        0             0 rpc.statd
Feb  2 04:10:01 ldr kernel: [ 1125]     0  1125     1038      151       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1132]     0  1132     1038      154       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1145]     0  1145     1038      152       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1146]     0  1146     1038      150       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1151]     0  1151     1038      150       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1160]     0  1160      640      151       2        0             0 xinetd
Feb  2 04:10:01 ldr kernel: [ 1166]     0  1166      545      102       3        0             0 acpid
Feb  2 04:10:01 ldr kernel: [ 1167]     0  1167      654      174       3        0             0 cron
Feb  2 04:10:01 ldr kernel: [ 1168]     0  1168      617       69       3        0             0 atd
Feb  2 04:10:01 ldr kernel: [ 1179]     0  1179      900      134       3        0             0 irqbalance
Feb  2 04:10:01 ldr kernel: [ 1189]   103  1189     6116      669       7        0             0 whoopsie
Feb  2 04:10:01 ldr kernel: [ 1235]     0  1235      843      176       3        0             0 rpc.mountd
Feb  2 04:10:01 ldr kernel: [ 1386]     0  1386     4937     1061       6        0             0 python
Feb  2 04:10:01 ldr kernel: [ 1387]     0  1387      535       79       3        0             0 watchdog
Feb  2 04:10:01 ldr kernel: [ 1395]     0  1395      600      139       3        0             0 getty
Feb  2 04:10:01 ldr kernel: [ 1396]     0  1396     1038      151       4        0             0 getty
Feb  2 04:10:01 ldr kernel: [  592]     0   592      823      262       3        0             0 ldrc_script.s
Feb  2 04:10:01 ldr kernel: [  593]     0   593     4680      710       5        0             0 python
Feb  2 04:10:01 ldr kernel: [  594]     0   594      653       59       3        0             0 cron
Feb  2 04:10:01 ldr kernel: Out of memory: Kill process 1386 (python) score 2 or sacrifice child
Feb  2 04:10:01 ldr kernel: Killed process 593 (python) total-vm:18720kB, anon-rss:2184kB, file-rss:656kB

我假设触发 oom-killer 是因为“正常”区域已降至 3672kB 的最小值以下,如下行所示:

Normal free:3556kB min:3672kB low:4588kB high:5508kB active_anon:3820kB inactive_anon:56kB active_file:168kB inactive_file:284kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:860136kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:72kB slab_reclaimable:5304kB slab_unreclaimable:14900kB kernel_stack:744kB pagetables:204kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:749 all_unreclaimable? yes

但是“HighMem”区域还有足够的空间:

HighMem free:1155752kB min:512kB low:1776kB high:3040kB active_anon:8932kB inactive_anon:212kB active_file:5152kB inactive_file:14072kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1184968kB managed:1184968kB mlocked:0kB dirty:0kB writeback:0kB mapped:5792kB shmem:932kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:212kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

那么,为什么内核会继续在 Normal 中分配内存,直到内存低于最小值,然后必须开始终止进程​​,而它本来可以使用 HighMem 中的一些可用空间?有没有办法解决这个问题,以便更多地使用 HighMem,以防止在 Normal 内存已满时终止进程?内核版本是 3.14。

相关内容