我遇到了 Linux 机器(Rocky Linux 版本 8.9)内存使用量突然达到峰值的问题,正在寻找有关如何调试此问题的指导。下面,我详细介绍了系统在正常运行期间和内存使用量达到峰值时的状态。
我的脚本(循环中)用于调试:
echo "timestamp: $(date)"
echo "-----------"
echo "Free:"
free -h
echo "-----------"
echo "top 30 processes by RAM usage:"
ps -eo pid,comm,%mem --sort=-%mem | head -n 30
echo "-----------"
echo "tmpfs:"
df -h | grep tmpfs
echo "-----------"
echo "pressure:"
cat /proc/pressure/memory
echo "-----------"
echo "processes count"
ps -eo pid --sort=-%mem | wc -l
echo "-----------"
echo "Slab:"
slabtop -o -s c | head -n15
echo "-----------"
echo "meminfo:"
cat /proc/meminfo
echo "-----------"
sleep 10
正常运行状态:
-----------
timestamp: ...
-----------
Free:
total used free shared buff/cache available
Mem: 125Gi 10Gi 89Gi 673Mi 24Gi 112Gi
Swap: 0B 0B 0B
-----------
top 30 processes by RAM usage:
PID COMMAND %MEM
5839 celery 1.2
5711 celery 1.1
5838 celery 1.1
2738431 tofu 0.6
2594888 terraform-provi 0.2
2731662 terraform-provi 0.2
6037 agent 0.2
2593993 terraform 0.1
2654624 terraform 0.1
3166 dockerd 0.1
2717064 terraform 0.1
2718714 terraform-provi 0.1
2746777 terraform-provi 0.1
2746604 terraform-provi 0.1
2746672 terraform-provi 0.1
2746840 terraform-provi 0.1
2746727 terraform-provi 0.1
2655252 terraform-provi 0.1
874 systemd-journal 0.1
2655212 terraform-provi 0.1
2731044 terraform 0.1
2759128 opa 0.0
6046 promtail 0.0
2757823 opa 0.0
1144 firewalld 0.0
6020 telegraf 0.0
2759141 terraform 0.0
2361 otelopscol 0.0
6038 process-agent 0.0
-----------
tmpfs:
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 1.4M 63G 1% /dev/shm
tmpfs 63G 673M 63G 2% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
tmpfs 13G 0 13G 0% /run/user/1002
-----------
pressure:
some avg10=0.01 avg60=0.04 avg300=0.01 total=244286854
full avg10=0.01 avg60=0.04 avg300=0.01 total=145337516
-----------
processes count
490
-----------
Slab:
Active / Total Objects (% used) : 6416975 / 8694321 (73.8%)
Active / Total Slabs (% used) : 176631 / 176631 (100.0%)
Active / Total Caches (% used) : 168 / 237 (70.9%)
Active / Total Size (% used) : 1729357.80K / 2232416.67K (77.5%)
Minimum / Average / Maximum Object : 0.01K / 0.26K / 10.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
709140 643851 90% 1.06K 23638 30 756416K xfs_inode
1571346 471769 30% 0.19K 37413 42 299304K dentry
278112 257157 92% 1.00K 8691 32 278112K kmalloc-1k
314580 222788 70% 0.57K 11235 28 179760K radix_tree_node
687834 629395 91% 0.19K 16377 42 131016K xfs_ili
234432 111063 47% 0.50K 7326 32 117216K kmalloc-512
779392 701563 90% 0.06K 12178 64 48712K kmalloc-64
760064 697202 91% 0.06K 11876 64 47504K lsm_inode_cache
-----------
meminfo:
MemTotal: 131460012 kB
MemFree: 93953016 kB
MemAvailable: 118221500 kB
Buffers: 920 kB
Cached: 24721888 kB
SwapCached: 0 kB
Active: 19786296 kB
Inactive: 14707240 kB
Active(anon): 365780 kB
Inactive(anon): 10095120 kB
Active(file): 19420516 kB
Inactive(file): 4612120 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 2684 kB
Writeback: 0 kB
AnonPages: 8249688 kB
Mapped: 1123776 kB
Shmem: 690156 kB
KReclaimable: 1483052 kB
Slab: 2267680 kB
SReclaimable: 1483052 kB
SUnreclaim: 784628 kB
KernelStack: 25584 kB
PageTables: 45756 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 65730004 kB
Committed_AS: 17221104 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 149580 kB
VmallocChunk: 0 kB
Percpu: 31104 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1837056 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 5311288 kB
DirectMap2M: 125757440 kB
DirectMap1G: 5242880 kB
高峰期:
-----------
timestamp: ....
-----------
Free:
total used free shared buff/cache available
Mem: 125Gi 99Gi 1.0Gi 673Mi 24Gi 24Gi
Swap: 0B 0B 0B
-----------
top 30 processes by RAM usage:
PID COMMAND %MEM
5839 celery 1.2
5711 celery 1.1
5838 celery 1.1
2738431 tofu 0.5
2594888 terraform-provi 0.5
2731662 terraform-provi 0.2
6037 agent 0.2
2593993 terraform 0.1
2717064 terraform 0.1
2654624 terraform 0.1
3166 dockerd 0.1
2718714 terraform-provi 0.1
2746777 terraform-provi 0.1
2746604 terraform-provi 0.1
2746672 terraform-provi 0.1
2746840 terraform-provi 0.1
2746727 terraform-provi 0.1
2655252 terraform-provi 0.1
874 systemd-journal 0.1
2655212 terraform-provi 0.1
2731044 terraform 0.1
6046 promtail 0.0
1144 firewalld 0.0
6020 telegraf 0.0
2361 otelopscol 0.0
6038 process-agent 0.0
1184 tuned 0.0
2751882 opa 0.0
6040 trace-agent 0.0
-----------
tmpfs:
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 1.4M 63G 1% /dev/shm
tmpfs 63G 673M 63G 2% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
tmpfs 13G 0 13G 0% /run/user/1002
-----------
pressure:
some avg10=0.16 avg60=0.09 avg300=0.02 total=244227196
full avg10=0.16 avg60=0.09 avg300=0.02 total=145285775
-----------
processes count
496
-----------
Slab:
Active / Total Objects (% used) : 6381059 / 8697291 (73.4%)
Active / Total Slabs (% used) : 176611 / 176611 (100.0%)
Active / Total Caches (% used) : 168 / 237 (70.9%)
Active / Total Size (% used) : 1721743.55K / 2232651.09K (77.1%)
Minimum / Average / Maximum Object : 0.01K / 0.26K / 10.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
709380 644858 90% 1.06K 23646 30 756672K xfs_inode
1572774 462962 29% 0.19K 37447 42 299576K dentry
278336 256306 92% 1.00K 8698 32 278336K kmalloc-1k
314580 223186 70% 0.57K 11235 28 179760K radix_tree_node
687834 631037 91% 0.19K 16377 42 131016K xfs_ili
234432 112304 47% 0.50K 7326 32 117216K kmalloc-512
780544 704162 90% 0.06K 12196 64 48784K kmalloc-64
760640 692818 91% 0.06K 11885 64 47540K lsm_inode_cache
-----------
meminfo:
MemTotal: 131460012 kB
MemFree: 1098564 kB
MemAvailable: 25244524 kB
Buffers: 920 kB
Cached: 24607936 kB
SwapCached: 0 kB
Active: 19206988 kB
Inactive: 15545632 kB
Active(anon): 365772 kB
Inactive(anon): 10468576 kB
Active(file): 18841216 kB
Inactive(file): 5077056 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 13560 kB
Writeback: 0 kB
AnonPages: 8674264 kB
Mapped: 1116760 kB
Shmem: 690148 kB
KReclaimable: 1483084 kB
Slab: 2267544 kB
SReclaimable: 1483084 kB
SUnreclaim: 784460 kB
KernelStack: 24960 kB
PageTables: 45932 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 65730004 kB
Committed_AS: 18166532 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 149076 kB
VmallocChunk: 0 kB
Percpu: 31104 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2279424 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 5311288 kB
DirectMap2M: 125757440 kB
DirectMap1G: 5242880 kB
问题:
我可以使用哪些工具或技术来进一步诊断内存使用量的突然增加?
附加信息:
“按 RAM 使用率排名前 30 的进程:”部分中提到的进程,例如 terraform,... 是 docker 容器中的进程。所有容器都有限制,容器的平均数量在正常状态和峰值期间不会发生变化。所以,我不认为这是 docker 容器的问题。
概括:
使用类似的工具,htop, ps, meminfo
我无法找到消耗内存的内容,并且需要帮助。
谢谢!