内存不足,但交换区可用

内存不足,但交换区可用

即使有可用的交换空间,我的服务器也会耗尽内存。

为什么?

我可以这样重现它:

eat_20GB_RAM() {
  perl -e '$a="c"x10000000000;print "OK\n";sleep 10000';
}
export -f eat_20GB_RAM
parallel -j0 eat_20GB_RAM ::: {1..25} &

当稳定下来时(即所有进程都进入睡眠状态),我会再运行一些:

parallel --delay 5 -j0 eat_20GB_RAM ::: {1..25} &

当稳定下来时(即所有进程都进入睡眠状态),大约使用 800 GB RAM/交换空间:

$ free -m
              total        used        free      shared  buff/cache   available
Mem:         515966      440676       74514           1         775       73392
Swap:       1256720      341124      915596

当我再跑几次时:

parallel --delay 15 -j0 eat_20GB_RAM ::: {1..50} &

我开始明白:

Out of memory!

即使显然有可用的交换。

$ free
              total        used        free      shared  buff/cache   available
Mem:      528349276   518336524     7675784       14128     2336968     7316984
Swap:    1286882284  1017746244   269136040

为什么?

$ cat /proc/meminfo 
MemTotal:       528349276 kB
MemFree:         7647352 kB
MemAvailable:    7281164 kB
Buffers:           70616 kB
Cached:          1503044 kB
SwapCached:        10404 kB
Active:         476833404 kB
Inactive:       20837620 kB
Active(anon):   476445828 kB
Inactive(anon): 19673864 kB
Active(file):     387576 kB
Inactive(file):  1163756 kB
Unevictable:       18776 kB
Mlocked:           18776 kB
SwapTotal:      1286882284 kB
SwapFree:       269134804 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:      496106244 kB
Mapped:           190524 kB
Shmem:             14128 kB
KReclaimable:     753204 kB
Slab:           15772584 kB
SReclaimable:     753204 kB
SUnreclaim:     15019380 kB
KernelStack:       46640 kB
PageTables:      3081488 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    1551056920 kB
Committed_AS:   1549560424 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1682132 kB
VmallocChunk:          0 kB
Percpu:           202752 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:    12251620 kB
DirectMap2M:    522496000 kB
DirectMap1G:     3145728 kB

答案1

/proc/meminfo你发现:

CommitLimit:    1551056920 kB
Committed_AS:   1549560424 kB

所以你已经达到了提交限制。

如果您禁用了内存过度使用(避免 OOM 杀手) 经过:

echo 2 > /proc/sys/vm/overcommit_memory

然后提交限制计算如下:

2   -   Don't overcommit. The total address space commit
        for the system is not permitted to exceed swap + a
        configurable amount (default is 50%) of physical RAM.
        Depending on the amount you use, in most situations
        this means a process will not be killed while accessing
        pages but will receive errors on memory allocation as
        appropriate.

(从:https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

您可以通过以下方式使用完整内存:

echo 100 > /proc/sys/vm/overcommit_ratio

当物理 RAM 和交换区都被保留时,你就会出现内存不足的情况。

在这种情况下,这个名字overcommit_ratio有点误导:你没有过度承诺任何东西。

即使使用此设置,您也可能会在交换耗尽之前看到内存不足。 malloc.c:

#include <stdio.h>
#include <malloc.h>
#include <stdlib.h>
#include <unistd.h>

void main(int argc, char **argv) {
  long bytes, sleep_sec;
  if(argc != 3) {
    printf("Usage: malloc bytes sleep_sec\n");
    exit(1);
  }
  sscanf(argv[1],"%ld",&bytes);
  sscanf(argv[2],"%ld",&sleep_sec);
  printf("Bytes: %ld Sleep: %ld\n",bytes,sleep_sec);
  if(malloc(bytes)) {
    sleep(sleep_sec);
  } else {
    printf("Out of memory\n");
    exit(1);
  }
}

编译为:

gcc -o malloc malloc.c

运行方式(保留 1 GB 10 秒):

./malloc 1073741824 10

如果运行此命令,即使有可用交换区,您也可能会看到 OOM:

# Plenty of ram+swap free before we start
$ free -m
              total        used        free      shared  buff/cache   available
Mem:         515966        2824      512361          16         780      511234
Swap:       1256720           0     1256720

# Reserve 1.8 TB
$ ./malloc 1800000000000 100 &
Bytes: 1800000000000 Sleep: 100

# It looks as if there is plenty of ram+swap free
$ free -m
              total        used        free      shared  buff/cache   available
Mem:         515966        2824      512361          16         780      511234
Swap:       1256720           0     1256720

# But there isn't: It is all reserved (just not used yet)
$ cat /proc/meminfo |grep omm
CommitLimit:    1815231560 kB
Committed_AS:   1761680484 kB

# Thus this fails (as you would expect)
$ ./malloc 180000000000 100
Bytes: 180000000000 Sleep: 100
Out of memory

因此,虽然free在实践中经常会做正确的事情,但查看 CommitLimit 和 Comfilled_AS 似乎更安全。

相关内容