尽管有大量可用交换空间,oom-killer 仍会终止进程

尽管有大量可用交换空间,oom-killer 仍会终止进程

这台机器有大量交换空间,但进程仍然偶尔会被 oom-killer 杀死。有人能解释这种行为吗?更重要的是,如何防止这种情况发生?

Dmesg 输出:

python invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=4
Pid: 13996, comm: python Not tainted 2.6.27-gentoo-r8cluster-e1000 #9

Call Trace:
 [<ffffffff8025ab6b>] oom_kill_process+0x57/0x1dc
 [<ffffffff802460c7>] getnstimeofday+0x53/0xb3
 [<ffffffff8025ae78>] badness+0x16a/0x1a9
 [<ffffffff8025b0a9>] out_of_memory+0x1f2/0x25c
 [<ffffffff8025e181>] __alloc_pages_internal+0x30f/0x3b2
 [<ffffffff8026fea0>] read_swap_cache_async+0x48/0xc0
 [<ffffffff8026ff6f>] swapin_readahead+0x57/0x98
 [<ffffffff80266d0e>] handle_mm_fault+0x408/0x706
 [<ffffffff8057da33>] do_page_fault+0x42c/0x7e7
 [<ffffffff8057baf9>] error_exit+0x0/0x51

Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 103
CPU    1: hi:  186, btch:  31 usd:  48
CPU    2: hi:  186, btch:  31 usd: 136
CPU    3: hi:  186, btch:  31 usd: 183
Active:480346 inactive:483 dirty:0 writeback:10 unstable:0
 free:3408 slab:5146 mapped:1408 pagetables:2687 bounce:0
Node 0 DMA free:8024kB min:20kB low:24kB high:28kB active:1156kB inactive:0kB present:8364kB pages_scanned:3246 all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003 2003
Node 0 DMA32 free:5608kB min:5716kB low:7144kB high:8572kB active:1920228kB inactive:1932kB present:2051308kB pages_scanned:2941301 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 8*4kB 3*8kB 4*16kB 3*32kB 4*64kB 3*128kB 2*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 8024kB
Node 0 DMA32: 42*4kB 6*8kB 1*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 5608kB
325424 total pagecache pages
323900 pages in swap cache
Swap cache stats: add 20776604, delete 20452704, find 7856195/10744535
Free swap  = 151691424kB
Total swap = 156290896kB
524032 pages RAM
9003 pages reserved
331431 pages shared
186210 pages non-shared
Out of memory: kill process 12965 (bash) score 2236480 or a child
Killed process 13996 (python)

VM相关的sysctl:

vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.oom_dump_tasks = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 2
vm.swappiness = 60
vm.nr_hugepages = 0
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256   256 32
vm.drop_caches = 0
vm.min_free_kbytes = 5740
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65536
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.numa_zonelist_order = default

答案1

看一眼这一页以获取一些可能有助于诊断您的问题的信息。

具体来说,您需要首先查看/proc/meminfo并获取更多信息。/proc/slabinfo

答案2

您的设备驱动程序或其他内核子系统分配了大量的实际内存。这就是它没有交换到您的交换空间的原因。

您需要确定您正在执行的工作负载并尝试隔离分配大量内存的内核系统。

相关内容