Ansible 触发 oom-killer

Ansible 触发 oom-killer

运行 ArchLinux
uname -a:

Linux localhost 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux

16GB 内存 14GB 交换空间

当我运行大型 ansible 作业时,它会触发我的 oom-killer。我认为 16GB 足以运行此类作业,但我不是 oom 日志专家(或 Linux 内存专家),以下是日志:

Feb 14 11:35:36 localhost kernel: Out of memory: Kill process 22698 (systemd-coredum) score 503 or sacrifice child
Feb 14 11:35:36 localhost kernel: Killed process 22698 (systemd-coredum) total-vm:880316kB, anon-rss:37604kB, file-rss:67380kB, shmem-rss:0kB
Feb 14 11:42:52 localhost kernel: ansible invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
Feb 14 11:42:52 localhost kernel: ansible cpuset=/ mems_allowed=0
Feb 14 11:42:52 localhost kernel: CPU: 0 PID: 27123 Comm: ansible Not tainted 4.7.2-1-ARCH #1
Feb 14 11:42:52 localhost kernel: Hardware name: Dell Inc. OptiPlex 7020/08WKV3, BIOS A02 11/20/2014
Feb 14 11:42:52 localhost kernel:  0000000000000286 00000000a544d0e1 ffff8803b3147b48 ffffffff812eb132
Feb 14 11:42:52 localhost kernel:  ffff8803b3147d28 ffff88024193f000 ffff8803b3147bb8 ffffffff811f6e5c
Feb 14 11:42:52 localhost kernel:  ffff8803b3148000 0000000000000000 ffffffff81b28920 ffffffff811789c0
Feb 14 11:42:52 localhost kernel: Call Trace:
Feb 14 11:42:52 localhost kernel:  [<ffffffff812eb132>] dump_stack+0x63/0x81
Feb 14 11:42:52 localhost kernel:  [<ffffffff811f6e5c>] dump_header+0x60/0x1e8
Feb 14 11:42:52 localhost kernel:  [<ffffffff811789c0>] ? page_alloc_cpu_notify+0x50/0x50
Feb 14 11:42:52 localhost kernel:  [<ffffffff811762fa>] oom_kill_process+0x22a/0x440
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117696a>] out_of_memory+0x40a/0x4b0
Feb 14 11:42:52 localhost kernel:  [<ffffffff812ffe08>] ? find_next_bit+0x18/0x20
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117c05b>] __alloc_pages_nodemask+0xf0b/0xf30
Feb 14 11:42:52 localhost kernel:  [<ffffffff8117c3d4>] alloc_kmem_pages_node+0x54/0xd0
Feb 14 11:42:52 localhost kernel:  [<ffffffff81077c06>] copy_process.part.8+0x136/0x19a0
Feb 14 11:42:52 localhost kernel:  [<ffffffff811a974a>] ? handle_mm_fault+0xa7a/0x1f60
Feb 14 11:42:52 localhost kernel:  [<ffffffff81079647>] _do_fork+0xd7/0x3d0
Feb 14 11:42:52 localhost kernel:  [<ffffffff810655f5>] ? __do_page_fault+0x1f5/0x510
Feb 14 11:42:52 localhost kernel:  [<ffffffff810799e9>] SyS_clone+0x19/0x20
Feb 14 11:42:52 localhost kernel:  [<ffffffff81003c07>] do_syscall_64+0x57/0xb0
Feb 14 11:42:52 localhost kernel:  [<ffffffff815de861>] entry_SYSCALL64_slow_path+0x25/0x25
Feb 14 11:42:52 localhost kernel: Mem-Info:
Feb 14 11:42:52 localhost kernel: active_anon:548787 inactive_anon:232682 isolated_anon:0
                                   active_file:28394 inactive_file:24931 isolated_file:8
                                   unevictable:0 dirty:1 writeback:0 unstable:0
                                   slab_reclaimable:1897009 slab_unreclaimable:19547
                                   mapped:51240 shmem:28342 pagetables:20339 bounce:0
                                   free:1284106 free_pcp:446 free_cma:0
Feb 14 11:42:52 localhost kernel: Node 0 DMA free:15628kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900k
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 3468 15978 15978
Feb 14 11:42:52 localhost kernel: Node 0 DMA32 free:1221320kB min:14632kB low:18288kB high:21944kB active_anon:274224kB inactive_anon:273556kB active_file:40556kB inactive_file:36556kB unevictable:0kB isolated(anon):0kB isolated(file):32k
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 12510 12510
Feb 14 11:42:52 localhost kernel: Node 0 Normal free:3899476kB min:52884kB low:66104kB high:79324kB active_anon:1920924kB inactive_anon:657172kB active_file:73020kB inactive_file:63168kB unevictable:0kB isolated(anon):0kB isolated(file):0
Feb 14 11:42:52 localhost kernel: lowmem_reserve[]: 0 0 0 0
Feb 14 11:42:52 localhost kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (ME) = 15628kB
Feb 14 11:42:52 localhost kernel: Node 0 DMA32: 166992*4kB (UME) 68889*8kB (UE) 7*16kB (H) 11*32kB (H) 11*64kB (H) 2*128kB (H) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1220760kB
Feb 14 11:42:52 localhost kernel: Node 0 Normal: 721354*4kB (UME) 126667*8kB (UEH) 16*16kB (H) 2*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3899072kB
Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb 14 11:42:52 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 14 11:42:52 localhost kernel: 125644 total pagecache pages
Feb 14 11:42:52 localhost kernel: 43931 pages in swap cache
Feb 14 11:42:52 localhost kernel: Swap cache stats: add 2753281, delete 2709350, find 730647/1154037
Feb 14 11:42:52 localhost kernel: Free swap  = 12677364kB
Feb 14 11:42:52 localhost kernel: Total swap = 14124028kB
Feb 14 11:42:52 localhost kernel: 4179504 pages RAM
Feb 14 11:42:52 localhost kernel: 0 pages HighMem/MovableOnly
Feb 14 11:42:52 localhost kernel: 84923 pages reserved
Feb 14 11:42:52 localhost kernel: 0 pages hwpoisoned
(...)
Feb 14 11:42:52 localhost kernel: Out of memory: Kill process 27876 (firefox) score 41 or sacrifice child
Feb 14 11:42:52 localhost kernel: Killed process 27876 (firefox) total-vm:4003016kB, anon-rss:1091960kB, file-rss:41516kB, shmem-rss:80216kB

以下是我使用过的一些 sysctl 值,它们有点帮助,但在更大的工作中它仍然发生:

vm.overcommit_memory = 2
vm.overcommit_ratio = 100

我的一些 ansible 工作是否真的耗尽了系统的所有内存 + 交换空间?

答案1

Ansible 绝对不应该使用那么多内存。您能详细说明一下您正在从事的工作吗? (有多少个,他们在做什么,使用的模块,示例等)我看到 firefox 在那里被杀死,你是否开始用 firefox 做很多事情?

相关内容