我有一个基于 Debian 的系统,内存不足,但似乎有足够的可用内存。该盒子将运行大约 6-12 天,然后它将开始终止分配内存的任何程序(通常是分配skbs
)。最终,它会杀死 Xorg 并退化为看门狗重启,然后该盒子将再运行 6-12 天,然后才会出现相同的故障。
以下是 oom-killer 日志:
[521652.462829] Xorg invoked oom-killer: gfp_mask=0x400cc0(GFP_KERNEL_ACCOUNT), order=0, oom_score_adj=0
[521652.462841] CPU: 1 PID: 28603 Comm: Xorg Tainted: G C 5.4.59-v7l+ #37
[521652.462844] Hardware name: BCM2711
[521652.462847] Backtrace:
[521652.462864] [<c020dfb0>] (dump_backtrace) from [<c020e318>] (show_stack+0x20/0x24)
[521652.462869] r7:ffffffff r6:00000000 r5:60000013 r4:c129dc94
[521652.462877] [<c020e2f8>] (show_stack) from [<c0a4ca0c>] (dump_stack+0xd8/0x11c)
[521652.462887] [<c0a4c934>] (dump_stack) from [<c0381478>] (dump_header+0x64/0x200)
[521652.462892] r10:00000000 r9:00001400 r8:c12a0090 r7:c0dd8104 r6:c2a93c80 r5:eee7bd00
[521652.462896] r4:cf0e1c68 r3:7747a280
[521652.462903] [<c0381414>] (dump_header) from [<c0380834>] (oom_kill_process+0x178/0x184)
[521652.462907] r7:c0dd8104 r6:cf0e1c68 r5:eee7c250 r4:eee7bd00
[521652.462914] [<c03806bc>] (oom_kill_process) from [<c03812c8>] (out_of_memory+0x27c/0x33c)
[521652.462918] r7:c1208380 r6:cf0e1c68 r5:c1204f88 r4:eee7bd00
[521652.462927] [<c038104c>] (out_of_memory) from [<c03cef78>] (__alloc_pages_nodemask+0xc18/0x1288)
[521652.462931] r7:00000000 r6:c120508c r5:0000fe2b r4:00000000
[521652.462941] [<c03ce360>] (__alloc_pages_nodemask) from [<c08eebb8>] (alloc_skb_with_frags+0xdc/0x1a4)
[521652.462945] r10:c551f840 r9:00008000 r8:004008c0 r7:00000000 r6:00000008 r5:00000008
[521652.462948] r4:00000003
[521652.462955] [<c08eeadc>] (alloc_skb_with_frags) from [<c08e6518>] (sock_alloc_send_pskb+0x214/0x248)
[521652.462960] r10:cf0e1d7c r9:c1204f88 r8:c026d33c r7:cf0e1dcc r6:ffffe000 r5:00000000
[521652.462964] r4:d799ea00
[521652.462973] [<c08e6304>] (sock_alloc_send_pskb) from [<c0a02e40>] (unix_stream_sendmsg+0x144/0x3a0)
[521652.462978] r10:d799ea00 r9:d799e700 r8:00000f00 r7:cf0e1dcc r6:c551fb40 r5:00008f00
[521652.462981] r4:00008000
[521652.462987] [<c0a02cfc>] (unix_stream_sendmsg) from [<c08e1690>] (sock_write_iter+0xb0/0x114)
[521652.462991] r10:eee3c900 r9:d4722d00 r8:cf0e1e38 r7:00000000 r6:00000000 r5:c1204f88
[521652.462994] r4:cf0e1ed4
[521652.463002] [<c08e15e0>] (sock_write_iter) from [<c03f9734>] (do_iter_readv_writev+0x168/0x1d4)
[521652.463006] r10:00000000 r9:cf0e1f60 r8:c1204f88 r7:00000000 r6:eee3c900 r5:00000000
[521652.463009] r4:00000000
[521652.463016] [<c03f95cc>] (do_iter_readv_writev) from [<c03faa4c>] (do_iter_write+0x94/0x1a4)
[521652.463020] r10:00000001 r9:bed31b84 r8:cf0e1f60 r7:00000000 r6:eee3c900 r5:cf0e1ed4
[521652.463023] r4:00000000
[521652.463030] [<c03fa9b8>] (do_iter_write) from [<c03fac2c>] (vfs_writev+0x9c/0xe8)
[521652.463034] r9:bed31b84 r8:cf0e1f60 r7:eee3c900 r6:cf0e1ed4 r5:0001a780 r4:c1204f88
[521652.463041] [<c03fab90>] (vfs_writev) from [<c03face8>] (do_writev+0x70/0x144)
[521652.463045] r8:eee3c900 r7:00000092 r6:00000000 r5:eee3c901 r4:c1204f88
[521652.463052] [<c03fac78>] (do_writev) from [<c03fc4d4>] (sys_writev+0x1c/0x20)
[521652.463056] r10:00000092 r9:cf0e0000 r8:c02011c4 r6:0000002e r5:bed31b84 r4:00000001
[521652.463064] [<c03fc4b8>] (sys_writev) from [<c0201000>] (ret_fast_syscall+0x0/0x28)
[521652.463068] Exception stack(0xcf0e1fa8 to 0xcf0e1ff0)
[521652.463072] 1fa0: 00000001 bed31b84 0000002e bed31b84 00000001 00000000
[521652.463077] 1fc0: 00000001 bed31b84 0000002e 00000092 00000001 00000000 b6fcf6f4 00000000
[521652.463081] 1fe0: 00000002 bed318f8 00000000 b69f1654
[521652.463085] Mem-Info:
[521652.463097] active_anon:27936 inactive_anon:40717 isolated_anon:0
active_file:9032 inactive_file:15510 isolated_file:0
unevictable:22734 dirty:0 writeback:0 unstable:0
slab_reclaimable:3248 slab_unreclaimable:7041
mapped:27349 shmem:61689 pagetables:1321 bounce:0
free:253912 free_pcp:34 free_cma:60386
[521652.463106] Node 0 active_anon:111744kB inactive_anon:162868kB active_file:36128kB inactive_file:62040kB unevictable:90936kB isolated(anon):0kB isolated(file):0kB mapped:109396kB dirty:0kB writeback:0kB shmem:246756kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[521652.463117] DMA free:258884kB min:20480kB low:24576kB high:28672kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:404kB unevictable:0kB writepending:0kB present:786432kB managed:679084kB mlocked:0kB kernel_stack:2280kB pagetables:0kB bounce:0kB free_pcp:136kB local_pcp:0kB free_cma:241544kB
[521652.463122] lowmem_reserve[]: 0 0 1204 1204
[521652.463139] HighMem free:756764kB min:512kB low:7948kB high:15384kB active_anon:111480kB inactive_anon:162868kB active_file:35840kB inactive_file:61440kB unevictable:90820kB writepending:0kB present:1232896kB managed:1232896kB mlocked:44kB kernel_stack:0kB pagetables:5284kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[521652.463143] lowmem_reserve[]: 0 0 0 0
[521652.463155] DMA: 356*4kB (UE) 786*8kB (UEC) 683*16kB (UEC) 28*32kB (UEC) 19*64kB (UEC) 8*128kB (C) 5*256kB (C) 2*512kB (C) 6*1024kB (C) 2*2048kB (C) 55*4096kB (C) = 259600kB
[521652.463197] HighMem: 322*4kB (UM) 3624*8kB (UM) 3968*16kB (UM) 2072*32kB (UM) 2012*64kB (UM) 1118*128kB (UM) 376*256kB (UM) 132*512kB (M) 40*1024kB (M) 19*2048kB (UM) 20*4096kB (UM) = 757576kB
[521652.463238] 86322 total pagecache pages
[521652.463246] 0 pages in swap cache
[521652.463252] Swap cache stats: add 36, delete 36, find 10/14
[521652.463257] Free swap = 2121980kB
[521652.463262] Total swap = 2122748kB
[521652.463267] 504832 pages RAM
[521652.463272] 308224 pages HighMem/MovableOnly
[521652.463277] 26837 pages reserved
[521652.463281] 65536 pages cma reserved
如下所示,没有内存碎片:
[521652.463155] DMA: 356*4kB (UE) 786*8kB (UEC) 683*16kB (UEC) 28*32kB (UEC) 19*64kB (UEC) 8*128kB (C) 5*256kB (C) 2*512kB (C) 6*1024kB (C) 2*2048kB (C) 55*4096kB (C) = 259600kB
[521652.463197] HighMem: 322*4kB (UM) 3624*8kB (UM) 3968*16kB (UM) 2072*32kB (UM) 2012*64kB (UM) 1118*128kB (UM) 376*256kB (UM) 132*512kB (M) 40*1024kB (M) 19*2048kB (UM) 20*4096kB (UM) = 757576kB
解码后的内容如下gfp_mask
:
Xorg invoked oom-killer: gfp_mask=0x400cc0(GFP_KERNEL_ACCOUNT), order=0, oom_score_adj=0
Order = 0 means allocating 4kb pages
0x400cc0
4: ___GFP_THISNODE
C: ___GFP_KSWAPD_RECLAIM, ___GFP_DIRECT_RECLAIM,
C: ___GFP_IO, ___GFP_FS
0: ZONE_NORMAL allocation
这是内核/proc/slabinfo
。Slabtop 基本上表示没有聚合内核内存,所以我不认为这是内核泄漏:
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
fuse_request 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0
fuse_inode 112 112 576 28 4 : tunables 0 0 0 : slabdata 4 4 0
PINGv6 0 0 896 18 4 : tunables 0 0 0 : slabdata 0 0 0
RAWv6 36 36 896 18 4 : tunables 0 0 0 : slabdata 2 2 0
UDPv6 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0
tw_sock_TCPv6 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
request_sock_TCPv6 0 0 240 17 1 : tunables 0 0 0 : slabdata 0 0 0
TCPv6 53 53 1920 17 8 : tunables 0 0 0 : slabdata 4 4 0
ext4_groupinfo_4k 72 72 112 36 1 : tunables 0 0 0 : slabdata 2 2 0
ovl_inode 722 1156 456 17 2 : tunables 0 0 0 : slabdata 68 68 0
mqueue_inode_cache 25 25 640 25 4 : tunables 0 0 0 : slabdata 1 1 0
discard_entry 0 0 80 51 1 : tunables 0 0 0 : slabdata 0 0 0
nat_entry 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
f2fs_inode_cache 0 0 752 21 4 : tunables 0 0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0
nfs_inode_cache 0 0 736 22 4 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 64 64 496 16 2 : tunables 0 0 0 : slabdata 4 4 0
fat_cache 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
squashfs_inode_cache 418 992 512 16 2 : tunables 0 0 0 : slabdata 62 62 0
jbd2_inode 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
jbd2_journal_head 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_inode_cache 264 264 744 22 4 : tunables 0 0 0 : slabdata 12 12 0
ext4_allocation_context 156 156 104 39 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_prealloc_space 224 224 72 56 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_io_end 340 340 48 85 1 : tunables 0 0 0 : slabdata 4 4 0
ext4_pending_reservation 256 256 16 256 1 : tunables 0 0 0 : slabdata 1 1 0
ext4_extent_status 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0
mbcache 408 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0
kioctx 18 18 448 18 2 : tunables 0 0 0 : slabdata 1 1 0
pid_namespace 0 0 120 34 1 : tunables 0 0 0 : slabdata 0 0 0
posix_timers_cache 88 88 184 22 1 : tunables 0 0 0 : slabdata 4 4 0
rpc_inode_cache 18 18 448 18 2 : tunables 0 0 0 : slabdata 1 1 0
rpc_buffers 16 16 2048 16 8 : tunables 0 0 0 : slabdata 1 1 0
ip4-frags 0 0 136 30 1 : tunables 0 0 0 : slabdata 0 0 0
xfrm_state 56 56 576 28 4 : tunables 0 0 0 : slabdata 2 2 0
PING 672 672 768 21 4 : tunables 0 0 0 : slabdata 32 32 0
RAW 63 63 768 21 4 : tunables 0 0 0 : slabdata 3 3 0
UDP 114 114 832 19 4 : tunables 0 0 0 : slabdata 6 6 0
tw_sock_TCP 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
request_sock_TCP 68 68 240 17 1 : tunables 0 0 0 : slabdata 4 4 0
TCP 108 108 1792 18 8 : tunables 0 0 0 : slabdata 6 6 0
cachefiles_object_jar 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
fscache_cookie_jar 42 42 96 42 1 : tunables 0 0 0 : slabdata 1 1 0
dquot 84 84 192 21 1 : tunables 0 0 0 : slabdata 4 4 0
eventpoll_pwq 510 510 40 102 1 : tunables 0 0 0 : slabdata 5 5 0
inotify_inode_mark 510 510 48 85 1 : tunables 0 0 0 : slabdata 6 6 0
scsi_data_buffer 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
request_queue 46 46 1408 23 8 : tunables 0 0 0 : slabdata 2 2 0
blkdev_ioc 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
biovec-max 100 100 3072 10 8 : tunables 0 0 0 : slabdata 10 10 0
biovec-128 42 42 1536 21 8 : tunables 0 0 0 : slabdata 2 2 0
biovec-64 84 84 768 21 4 : tunables 0 0 0 : slabdata 4 4 0
user_namespace 0 0 376 21 2 : tunables 0 0 0 : slabdata 0 0 0
sock_inode_cache 572 968 640 25 4 : tunables 0 0 0 : slabdata 41 41 0
skbuff_fclone_cache 336 336 384 21 2 : tunables 0 0 0 : slabdata 16 16 0
skbuff_head_cache 21690 21861 192 21 1 : tunables 0 0 0 : slabdata 1041 1041 0
configfs_dir_cache 73 73 56 73 1 : tunables 0 0 0 : slabdata 1 1 0
file_lock_cache 638 864 128 32 1 : tunables 0 0 0 : slabdata 27 27 0
fsnotify_mark_connector 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
net_namespace 9 9 3456 9 8 : tunables 0 0 0 : slabdata 1 1 0
task_delay_info 3264 3264 80 51 1 : tunables 0 0 0 : slabdata 64 64 0
taskstats 92 92 344 23 2 : tunables 0 0 0 : slabdata 4 4 0
proc_dir_entry 672 672 128 32 1 : tunables 0 0 0 : slabdata 21 21 0
pde_opener 680 680 24 170 1 : tunables 0 0 0 : slabdata 4 4 0
proc_inode_cache 198 360 440 18 2 : tunables 0 0 0 : slabdata 20 20 0
seq_file 184 184 88 46 1 : tunables 0 0 0 : slabdata 4 4 0
bdev_cache 112 112 576 28 4 : tunables 0 0 0 : slabdata 4 4 0
shmem_inode_cache 2259 2431 456 17 2 : tunables 0 0 0 : slabdata 143 143 0
kernfs_iattrs_cache 1512 1512 72 56 1 : tunables 0 0 0 : slabdata 27 27 0
kernfs_node_cache 26334 26334 96 42 1 : tunables 0 0 0 : slabdata 627 627 0
filp 3347 4788 192 21 1 : tunables 0 0 0 : slabdata 228 228 0
inode_cache 9885 10460 400 20 2 : tunables 0 0 0 : slabdata 523 523 0
dentry 14017 26190 136 30 1 : tunables 0 0 0 : slabdata 873 873 0
names_cache 40 40 4096 8 8 : tunables 0 0 0 : slabdata 5 5 0
key_jar 1235 1365 192 21 1 : tunables 0 0 0 : slabdata 65 65 0
buffer_head 1783 2176 64 64 1 : tunables 0 0 0 : slabdata 34 34 0
uts_namespace 0 0 416 19 2 : tunables 0 0 0 : slabdata 0 0 0
vm_area_struct 9393 10842 104 39 1 : tunables 0 0 0 : slabdata 278 278 0
mm_struct 368 368 512 16 2 : tunables 0 0 0 : slabdata 23 23 0
files_cache 384 384 256 16 1 : tunables 0 0 0 : slabdata 24 24 0
signal_cache 575 575 704 23 4 : tunables 0 0 0 : slabdata 25 25 0
sighand_cache 423 480 1344 24 8 : tunables 0 0 0 : slabdata 20 20 0
task_struct 296 360 3904 8 8 : tunables 0 0 0 : slabdata 45 45 0
cred_jar 1837 1952 128 32 1 : tunables 0 0 0 : slabdata 61 61 0
anon_vma_chain 9461 10880 32 128 1 : tunables 0 0 0 : slabdata 85 85 0
anon_vma 5761 6351 56 73 1 : tunables 0 0 0 : slabdata 87 87 0
pid 2432 2432 64 64 1 : tunables 0 0 0 : slabdata 38 38 0
trace_event_file 1445 1445 48 85 1 : tunables 0 0 0 : slabdata 17 17 0
radix_tree_node 3240 6708 304 26 2 : tunables 0 0 0 : slabdata 258 258 0
task_group 192 192 256 16 1 : tunables 0 0 0 : slabdata 12 12 0
vmap_area 8064 8064 32 128 1 : tunables 0 0 0 : slabdata 63 63 0
dma-kmalloc-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1k 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 0 0 512 16 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-256 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
dma-kmalloc-192 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-8k 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-4k 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-2k 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-1k 0 0 1024 16 4 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-512 0 0 512 16 2 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-256 0 0 256 16 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-192 0 0 192 21 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-rcl-128 2496 2496 128 32 1 : tunables 0 0 0 : slabdata 78 78 0
kmalloc-rcl-64 2304 2304 64 64 1 : tunables 0 0 0 : slabdata 36 36 0
kmalloc-8k 64 68 8192 4 8 : tunables 0 0 0 : slabdata 17 17 0
kmalloc-4k 588 612 4096 8 8 : tunables 0 0 0 : slabdata 80 80 0
kmalloc-2k 272 272 2048 16 8 : tunables 0 0 0 : slabdata 17 17 0
kmalloc-1k 1318 1392 1024 16 4 : tunables 0 0 0 : slabdata 87 87 0
kmalloc-512 2130 2200 512 16 2 : tunables 0 0 0 : slabdata 138 138 0
kmalloc-256 660 752 256 16 1 : tunables 0 0 0 : slabdata 47 47 0
kmalloc-192 1701 1701 192 21 1 : tunables 0 0 0 : slabdata 81 81 0
kmalloc-128 2948 3392 128 32 1 : tunables 0 0 0 : slabdata 106 106 0
kmalloc-64 25371 28928 64 64 1 : tunables 0 0 0 : slabdata 452 452 0
kmem_cache_node 256 256 64 64 1 : tunables 0 0 0 : slabdata 4 4 0
kmem_cache 144 144 256 16 1 : tunables 0 0 0 : slabdata 9 9 0
有人能帮忙解决资源耗尽的问题吗?我甚至无法确定 OOM 终止程序为何运行。任何帮助都将不胜感激。
编辑:
回复评论:
内存 cgroup 未启用,因此它(不应该是)cgroup 问题:
cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 4 1 1
cpu 2 1 1
cpuacct 2 1 1
blkio 7 1 1
memory 0 57 0
devices 5 39 1
freezer 6 1 1
net_cls 3 1 1
pids 8 46 1
这是/proc/meminfo
:
cat /proc/meminfo
MemTotal: 1911980 kB
MemFree: 734172 kB
MemAvailable: 890688 kB
Buffers: 16664 kB
Cached: 479784 kB
SwapCached: 0 kB
Active: 395800 kB
Inactive: 273448 kB
Active(anon): 336436 kB
Inactive(anon): 111840 kB
Active(file): 59364 kB
Inactive(file): 161608 kB
Unevictable: 125944 kB
Mlocked: 7552 kB
HighTotal: 1232896 kB
HighFree: 446848 kB
LowTotal: 679084 kB
LowFree: 287324 kB
SwapTotal: 524284 kB
SwapFree: 524284 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 298848 kB
Mapped: 218932 kB
Shmem: 272964 kB
KReclaimable: 17748 kB
Slab: 43448 kB
SReclaimable: 17748 kB
SUnreclaim: 25700 kB
KernelStack: 2664 kB
PageTables: 5140 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1480272 kB
Committed_AS: 1766576 kB
VmallocTotal: 245760 kB
VmallocUsed: 6368 kB
VmallocChunk: 0 kB
Percpu: 512 kB
CmaTotal: 262144 kB
CmaFree: 240632 kB
我已经尝试调整vm.overcommit_memory=2
,并vm.overcommit_ratio=2
在另一台比我的虚拟机工作集更大的机器上获得 4GB 的 CommitLimit,以排除启发式过量提交算法存在的问题。它仍然以同样的方式崩溃。