我的一些虚拟机有问题。它们是我们设计师的工作站,运行的是 Centos 6.10。使用一个月后,它们似乎不知为何内存耗尽了。
> free -m
total used free shared buffers cached
Mem: 36148 35734 413 0 25 178
-/+ buffers/cache: 35530 617
Swap: 2304 117 2187
它上面没有运行大型进程,使用的缓存量也微不足道。经过大量挖掘,我发现了内核内存块,特别是无法回收的块(SUnreclaim)。
> more /proc/meminfo
MemTotal: 37015692 kB
MemFree: 427320 kB
Buffers: 26192 kB
Cached: 183376 kB
SwapCached: 10476 kB
Active: 49876 kB
Inactive: 191844 kB
Active(anon): 13464 kB
Inactive(anon): 18892 kB
Active(file): 36412 kB
Inactive(file): 172952 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2360316 kB
SwapFree: 2240324 kB
Dirty: 100 kB
Writeback: 0 kB
AnonPages: 28320 kB
Mapped: 17792 kB
Shmem: 112 kB
Slab: 36030684 kB
SReclaimable: 13704 kB
SUnreclaim: 36016980 kB
KernelStack: 5328 kB
PageTables: 19112 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 20868160 kB
Committed_AS: 808636 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 380368 kB
VmallocChunk: 34359346944 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8044 kB
DirectMap2M: 37740544 kB
因此,在 36G 的 RAM 中,34-35G 被内核保留,无法回收。
> slabtop -o
Active / Total Objects (% used) : 10121078 / 10203079 (99.2%)
Active / Total Slabs (% used) : 4305167 / 4305178 (100.0%)
Active / Total Caches (% used) : 117 / 209 (56.0%)
Active / Total Size (% used) : 35978633.10K / 35991994.40K (100.0%)
Minimum / Average / Maximum Object : 0.02K / 3.53K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
4188823 4187702 99% 0.06K 70997 59 283988K size-64
2694370 2694370 100% 8.00K 2694370 1 21554960K size-8192
1385644 1385644 100% 2.00K 692822 2 2771288K size-2048
668261 668261 100% 16.00K 668261 1 10692176K size-16384
423810 423458 99% 0.25K 28254 15 113016K size-256
374168 373771 99% 1.00K 93542 4 374168K size-1024
144400 138675 96% 0.50K 18050 8 72200K size-512
142080 90740 63% 0.12K 4736 30 18944K size-128
26412 26401 99% 4.00K 26412 1 105648K size-4096
24864 24000 96% 0.03K 222 112 888K size-32
15680 10729 68% 0.19K 784 20 3136K dentry
13949 13949 100% 0.10K 377 37 1508K buffer_head
13446 13436 99% 0.14K 498 27 1992K sysfs_dir_cache
13360 10073 75% 0.19K 668 20 2672K size-192
13015 11061 84% 0.20K 685 19 2740K vm_area_struct
11396 7556 66% 0.05K 148 77 592K anon_vma_chain
8692 8661 99% 0.07K 164 53 656K selinux_inode_security
8308 5456 65% 0.05K 124 67 496K anon_vma
6642 6547 98% 0.58K 1107 6 4428K inode_cache
5100 3788 74% 0.25K 340 15 1360K filp
4095 3885 94% 0.55K 585 7 2340K radix_tree_node
2160 2062 95% 0.98K 540 4 2160K ext4_inode_cache
1908 1872 98% 0.07K 36 53 144K Acpi-Operand
780 642 82% 0.19K 39 20 156K cred_jar
770 748 97% 0.77K 154 5 616K shmem_inode_cache
610 547 89% 0.69K 122 5 488K sock_inode_cache
552 503 91% 0.04K 6 92 24K Acpi-Namespace
510 416 81% 0.12K 17 30 68K pid
476 401 84% 0.11K 14 34 56K task_delay_info
462 453 98% 0.53K 66 7 264K idr_layer_cache
424 396 93% 0.88K 106 4 424K UNIX
392 360 91% 0.50K 49 8 196K task_xstate
354 350 98% 2.61K 118 3 944K task_struct
318 92 28% 0.07K 6 53 24K eventpoll_pwq
295 62 21% 0.06K 5 59 20K tcp_bind_bucket
290 196 67% 0.38K 29 10 116K ip_dst_cache
288 209 72% 0.12K 9 32 36K inotify_inode_mark_entry
259 234 90% 1.06K 37 7 296K signal_cache
256 159 62% 0.23K 16 16 64K cfq_queue
252 157 62% 0.13K 9 28 36K cfq_io_context
240 154 64% 0.08K 5 48 20K blkdev_ioc
240 92 38% 0.12K 8 30 32K eventpoll_epi
236 232 98% 0.06K 4 59 16K fs_cache
225 225 100% 2.06K 75 3 600K sighand_cache
208 208 100% 32.12K 208 1 13312K kmem_cache
202 2 0% 0.02K 1 202 4K jbd2_revoke_table
202 2 0% 0.02K 1 202 4K revoke_table
176 146 82% 0.69K 16 11 128K files_cache
150 129 86% 1.38K 30 5 240K mm_struct
150 99 66% 0.12K 5 30 20K scsi_sense_cache
144 102 70% 0.64K 24 6 96K proc_inode_cache
144 16 11% 0.02K 1 144 4K fsnotify_event_holder
144 6 4% 0.02K 1 144 4K fasync_cache
144 17 11% 0.02K 1 144 4K jbd2_journal_handle
141 82 58% 1.02K 47 3 188K nfs_inode_cache
140 140 100% 0.19K 7 20 28K virtio_scsi_cmd
135 108 80% 0.25K 9 15 36K scsi_cmd_cache
120 115 95% 0.25K 8 15 32K mnt_cache
112 7 6% 0.03K 1 112 4K dnotify_struct
112 16 14% 0.03K 1 112 4K inotify_event_private_data
112 2 1% 0.03K 1 112 4K sd_ext_cdb
92 31 33% 0.04K 1 92 4K khugepaged_mm_slot
88 80 90% 0.17K 4 22 16K file_lock_cache
80 80 100% 0.19K 4 20 16K bio-0
77 64 83% 0.34K 7 11 28K blkdev_requests
75 66 88% 0.25K 5 15 20K ndisc_cache
74 13 17% 0.10K 2 37 8K ext4_prealloc_space
68 68 100% 0.11K 2 34 8K jbd2_journal_head
60 60 100% 0.12K 2 30 8K nfs_page
59 1 1% 0.06K 1 59 4K inet_peer_cache
59 23 38% 0.06K 1 59 4K fib6_nodes
53 10 18% 0.07K 1 53 4K ip_fib_hash
简单的重启即可解决问题,但每月重启所有虚拟机会造成混乱。您有关于如何调试这个问题并找出所有这些内存都去了哪里的想法吗?
谢谢尼克