当系统日志充满“java:页面分配失败”时,系统会冻结

当系统日志充满“java:页面分配失败”时,系统会冻结

Ubuntu 16.04 x64bit 内核 4.4.0 cpu:8,内存:31G,ZFS 为主文件系统,已安装 cifs 共享

# sudo numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32157 MB
node 0 free: 2301 MB
node distances:
node   0 
  0:  10 

cat /proc/meminfo | grep -i huge
AnonHugePages:  13080576 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

我的服务器随机冻结,登录系统日志(完整日志见粘贴箱)我读过这个文章这解释了这类错误以及可能的解决方案这里

Jan 15 02:35:01 centrallogserver CRON[55892]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jan 15 02:36:49 centrallogserver kernel: [120146.673901] java: page allocation failure: order:4, mode:0x240c0c0
Jan 15 02:36:49 centrallogserver kernel: [120146.673908] CPU: 7 PID: 52372 Comm: java Tainted: P           O    4.4.0-112-generic #135-Ubuntu
Jan 15 02:36:49 centrallogserver kernel: [120146.673911] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018
Jan 15 02:36:49 centrallogserver kernel: [120146.673915]  0000000000000286 d4f0e41d54eb99fa ffff88038a7cb968 ffffffff813fc233
Jan 15 02:36:49 centrallogserver kernel: [120146.673920]  000000000240c0c0 0000000000000000 ffff88038a7cb9f8 ffffffff8119696a
Jan 15 02:36:49 centrallogserver kernel: [120146.673924]  d4f0e41d00000004 0000000000000004 0000000000000040 ffff880284f12a00
Jan 15 02:36:49 centrallogserver kernel: [120146.673929] Call Trace:
Jan 15 02:36:49 centrallogserver kernel: [120146.673938]  [<ffffffff813fc233>] dump_stack+0x63/0x90
Jan 15 02:36:49 centrallogserver kernel: [120146.673945]  [<ffffffff8119696a>] warn_alloc_failed+0xfa/0x150
Jan 15 02:36:49 centrallogserver kernel: [120146.673952]  [<ffffffff8119a14f>] ? __alloc_pages_direct_compact+0x10f/0x130
Jan 15 02:36:49 centrallogserver kernel: [120146.673959]  [<ffffffff8119a5fd>] __alloc_pages_slowpath.constprop.88+0x48d/0xb00
Jan 15 02:36:49 centrallogserver kernel: [120146.673966]  [<ffffffff8119aef6>] __alloc_pages_nodemask+0x286/0x2a0
Jan 15 02:36:49 centrallogserver kernel: [120146.673975]  [<ffffffff811e483c>] alloc_pages_current+0x8c/0x110
Jan 15 02:36:49 centrallogserver kernel: [120146.673980]  [<ffffffff81198ac9>] alloc_kmem_pages+0x19/0x90
Jan 15 02:36:49 centrallogserver kernel: [120146.673986]  [<ffffffff811b63ce>] kmalloc_order_trace+0x2e/0xe0
Jan 15 02:36:49 centrallogserver kernel: [120146.673993]  [<ffffffff811f10ce>] __kmalloc+0x22e/0x250
Jan 15 02:36:49 centrallogserver kernel: [120146.674053]  [<ffffffffc08e5c51>] smb2_unlock_range+0xa1/0x340 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674094]  [<ffffffffc08daef1>] ? smb2_add_credits+0xb1/0x250 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674137]  [<ffffffffc08bd600>] cifs_lock+0xc00/0x12a0 [cifs]
Jan 15 02:36:49 centrallogserver kernel: [120146.674142]  [<ffffffff811f048b>] ? __slab_free+0xcb/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674147]  [<ffffffff811f048b>] ? __slab_free+0xcb/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674154]  [<ffffffff8139677e>] ? common_file_perm+0x6e/0x1a0
Jan 15 02:36:49 centrallogserver kernel: [120146.674160]  [<ffffffff81266c6e>] vfs_lock_file+0x1e/0x40
Jan 15 02:36:49 centrallogserver kernel: [120146.674164]  [<ffffffff81266f6b>] do_lock_file_wait+0x5b/0x100
Jan 15 02:36:49 centrallogserver kernel: [120146.674170]  [<ffffffff811efc8a>] ? kmem_cache_alloc+0x1ca/0x1f0
Jan 15 02:36:49 centrallogserver kernel: [120146.674174]  [<ffffffff812651bb>] ? locks_alloc_lock+0x1b/0x70
Jan 15 02:36:49 centrallogserver kernel: [120146.674179]  [<ffffffff81268763>] fcntl_setlk+0x133/0x2c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674186]  [<ffffffff812244c2>] SyS_fcntl+0x3e2/0x5e0
Jan 15 02:36:49 centrallogserver kernel: [120146.674193]  [<ffffffff818457ad>] entry_SYSCALL_64_fastpath+0x2b/0xe7
Jan 15 02:36:49 centrallogserver kernel: [120146.674197] Mem-Info:
Jan 15 02:36:49 centrallogserver kernel: [120146.674207] active_anon:3871678 inactive_anon:544913 isolated_anon:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207]  active_file:181867 inactive_file:199383 isolated_file:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207]  unevictable:5021 dirty:138 writeback:0 unstable:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207]  slab_reclaimable:232459 slab_unreclaimable:1851907
Jan 15 02:36:49 centrallogserver kernel: [120146.674207]  mapped:260688 shmem:6003 pagetables:26155 bounce:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674207]  free:179404 free_pcp:283 free_cma:0
Jan 15 02:36:49 centrallogserver kernel: [120146.674217] Node 0 DMA free:15840kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 15 02:36:49 centrallogserver kernel: [120146.674230] lowmem_reserve[]: 0 2976 32142 32142 32142
Jan 15 02:36:49 centrallogserver kernel: [120146.674236] Node 0 DMA32 free:164924kB min:12132kB low:15164kB high:18196kB active_anon:465040kB inactive_anon:473384kB active_file:15584kB inactive_file:57088kB unevictable:1204kB isolated(anon):0kB isolated(file):0kB present:3129152kB managed:3048416kB mlocked:1204kB dirty:44kB writeback:0kB mapped:45748kB shmem:2908kB slab_reclaimable:146872kB slab_unreclaimable:1350916kB kernel_stack:6624kB pagetables:9124kB unstable:0kB bounce:0kB free_pcp:704kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 15 02:36:49 centrallogserver kernel: [120146.674249] lowmem_reserve[]: 0 0 29165 29165 29165
Jan 15 02:36:49 centrallogserver kernel: [120146.674255] Node 0 Normal free:536852kB min:118872kB low:148588kB high:178308kB active_anon:15021672kB inactive_anon:1706268kB active_file:711884kB inactive_file:740444kB unevictable:18880kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29865212kB mlocked:18880kB dirty:508kB writeback:0kB mapped:997004kB shmem:21104kB slab_reclaimable:782964kB slab_unreclaimable:6056680kB kernel_stack:65472kB pagetables:95496kB unstable:0kB bounce:0kB free_pcp:428kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 15 02:36:49 centrallogserver kernel: [120146.674267] lowmem_reserve[]: 0 0 0 0 0
Jan 15 02:36:49 centrallogserver kernel: [120146.674273] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674291] Node 0 DMA32: 504*4kB (UME) 2593*8kB (UME) 2117*16kB (UE) 3322*32kB (UH) 1*64kB (H) 2*128kB (H) 2*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 164792kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674310] Node 0 Normal: 17501*4kB (UEH) 30839*8kB (UMH) 12373*16kB (UMH) 689*32kB (U) 0*64kB 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 536860kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674329] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674333] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674335] 408507 total pagecache pages
Jan 15 02:36:49 centrallogserver kernel: [120146.674338] 19222 pages in swap cache
Jan 15 02:36:49 centrallogserver kernel: [120146.674341] Swap cache stats: add 382634, delete 363412, find 121020/166633
Jan 15 02:36:49 centrallogserver kernel: [120146.674344] Free swap  = 3438328kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674346] Total swap = 4194300kB
Jan 15 02:36:49 centrallogserver kernel: [120146.674348] 8388461 pages RAM
Jan 15 02:36:49 centrallogserver kernel: [120146.674351] 0 pages HighMem/MovableOnly
Jan 15 02:36:49 centrallogserver kernel: [120146.674353] 156078 pages reserved
Jan 15 02:36:49 centrallogserver kernel: [120146.674355] 0 pages cma reserved
Jan 15 02:36:49 centrallogserver kernel: [120146.674357] 0 pages hwpoisoned
Jan 15 02:36:49 centrallogserver kernel: [120146.674577] java: page allocation failure: order:4, mode:0x240c0c0
Jan 15 02:36:49 centrallogserver kernel: [120146.674581] CPU: 7 PID: 52372 Comm: java Tainted: P           O    4.4.0-112-generic #135-Ubuntu
Jan 15 02:36:49 centrallogserver kernel: [120146.674585] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/19/2018

作为可能的解决方法,我将最小可用字节数从 60MB 增加到 256MB,并将 vfs_cache_pressure=50 同样,我将 zfs_arc_max 和 zfs_dirty_data_max 分别减少到 8GB 和 128MB,但问题仍然存在。请建议可以进行哪些系统调整以防止冻结问题,我看到的一个可能方法是禁用过度使用,这样就不会分配大于物理内存的内存

相关内容