cgroups 内存 16GB 上限

Question

**注：为后代取消删除**

你的问题在这里

# No alternate memory nodes if the system is not NUMA
# On computenodes use all available cores
    cpuset {
        cpuset.mems="0";
        cpuset.cpus="0-47";
    }
}

你只使用一内存节点。您需要设置此项才能使用全部记忆节点。

我也认为下面的内容也适用，除非您了解下面的内容，否则您仍然会看到问题。因此留给后人看。

这个问题基本上归结于所使用的硬件。内核有一个启发式方法来确定此开关的值。这会改变内核确定 NUMA 系统上的内存压力的方式。

zone_reclaim_mode:

Zone_reclaim_mode allows someone to set more or less aggressive approaches to
reclaim memory when a zone runs out of memory. If it is set to zero then no
zone reclaim occurs. Allocations will be satisfied from other zones / nodes
in the system.

This is value ORed together of

1   = Zone reclaim on
2   = Zone reclaim writes dirty pages out
4   = Zone reclaim swaps pages

zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
page allocator will then reclaim easily reusable pages (those page
cache pages that are currently not used) before allocating off node pages.

It may be beneficial to switch off zone reclaim if the system is
used for a file server and all of memory should be used for caching files
from disk. In that case the caching effect is more important than
data locality.

Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
reclaim will write out dirty pages if a zone fills up and so effectively
throttle the process. This may decrease the performance of a single process
since it cannot use all of system memory to buffer the outgoing writes
anymore but it preserve the memory on other nodes so that the performance
of other processes running on other nodes will not be affected.

Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.

为了让您对这里发生的事情有个大概的了解，内存被分成多个区域，这在 RAM 与特定 CPU 绑定的 NUMA 系统上特别有用。在这些主机中，内存位置可能是影响性能的重要因素。例如，如果内存组 1 和 2 分配给物理 CPU 0，则 CPU 1 可以访问这些内存，但代价是锁定 CPU 0 无法访问的内存，这会导致性能下降。

在 Linux 上，分区反映了物理机的 NUMA 布局。每个区域大小为 16GB。

在区域回收开启的情况下，内核选择在完整区域 (16 GB) 中回收 (将脏页写入磁盘、逐出文件缓存、交换内存)，而不是允许进程在另一个区域分配内存（这会影响该 CPU 的性能）。这就是为什么您会注意到 16GB 之后进行交换。

如果你关闭这个值应该改变内核的行为，不是积极地回收区域数据，而是从另一个节点分配。

zone_reclaim_mode尝试通过运行系统来关闭它sysctl -w vm.zone_reclaim_mode=0，然后重新运行测试。

请注意，在这种关闭配置下运行的长时间运行的高内存进程zone_reclaim_mode会随着时间的推移变得越来越昂贵。

如果您允许大量不同的 CPU 上运行的大量不同进程都使用大量内存来使用任何具有可用页面的节点，则可以有效地将主机的性能提升到类似于只有 1 个物理 CPU 的性能。

Answer 1