我正在尝试诊断无头服务器上的一些随机段错误,似乎奇怪的一件事是它们似乎只在内存压力下发生,并且我的交换大小不会超过 0。
如何强制我的机器交换以确保其正常工作?
orca ~ # free
total used free shared buffers cached
Mem: 1551140 1472392 78748 0 333920 1046368
-/+ buffers/cache: 92104 1459036
Swap: 1060280 0 1060280
orca ~ # swapon -s
Filename Type Size Used Priority
/dev/sdb2 partition 1060280 0 -1
答案1
这是Linux吗?如果是这样,您可以尝试以下操作:
# sysctl vm.swappiness=100
(您可能想sysctl vm.swappiness
先使用它来查看默认值,在我的系统上是10
)
然后要么使用使用大量 RAM 的程序,要么编写一个仅消耗 RAM 的小应用程序。以下将做到这一点(来源:Linux 磁盘缓存的实验和乐趣):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char** argv) {
int max = -1;
int mb = 0;
int multiplier = 1; // allocate 1 MB every time unit. Increase this to e.g.100 to allocate 100 MB every time unit.
char* buffer;
if(argc > 1)
max = atoi(argv[1]);
while((buffer=malloc(multiplier * 1024*1024)) != NULL && mb != max) {
memset(buffer, 1, multiplier * 1024*1024);
mb++;
printf("Allocated %d MB\n", multiplier * mb);
sleep(1); // time unit: 1 second
}
return 0;
}
对 memset 行进行编码,以使用 1 而不是 0 来初始化块,因为 Linux 虚拟内存管理器可能足够聪明,不会实际分配任何 RAM。我添加了 sleep(1) ,以便让您有更多时间观察进程,因为它吞噬了内存和交换。一旦你没有足够的 RAM 和 SWAP 来提供给程序,OOM 杀手就会杀死它。你可以用以下命令编译它
gcc filename.c -o memeater
其中 filename.c 是您保存上述程序的文件。然后您可以使用 ./memeater 运行它。
我不会在生产机器上这样做。
答案2
为了运行本文中的测试,您需要以下内容:
对于第一个测试,您需要确保在正常情况下可以正常读取和写入交换分区。您可以通过运行这些命令来执行此操作。不要忘记更改amount_of_swap
为您拥有的实际交换金额。timeout
如果您的交换特别慢或特别大,您可能还需要增加。
$ amount_of_swap=2G
$ timeout=60
$ systemd-run --property="MemoryHigh=128M" -- \
stress-ng \
--timeout "$timeout" \
--vm 1 \
--vm-hang 0 \
--vm-method zero-one \
--vm-bytes "$amount_of_swap"
Running as unit: run-u7.service
$ # Wait for it to start using swap, then run:
$ free
total used free shared buff/cache available
Mem: 479432 345384 19136 3284 114912 117948
Swap: 2097148 1975096 122052
$ # Make sure that stress-ng exited successfully:
$ unit_name=run-u7.service # This might be different on your system. See systemd-run’s output.
$ journalctl --boot --unit="$unit_name"
Started /nix/store/fmsawx6292lg2mc96hj5gmql1mk973dz-stress-ng-0.17.01/bin/stress-ng --timeout 60 --vm 1 --vm-hang 0 --vm-method zero-one --vm-bytes 2G.
invoked with '/nix/store/fmsawx6292lg2mc96hj5gmql1mk973dz-stress-ng-0.17.01/bin/stress-ng --timeout 60 --vm 1 --vm-hang 0 --vm-method zero-one --vm-bytes 2G' by user 0 'root'
stress-ng: info: [2237] setting to a 1 min, 0 secs run per stressor
stress-ng: info: [2237] dispatching hogs: 1 vm
system: 'jasonyundt' Linux 6.8.1 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar 15 18:19:29 UTC 2024 x86_64
memory (MB): total 468.20, free 127.03, shared 3.21, buffer 1.59, swap 2048.00, free swap 2046.73
stress-ng: info: [2237] skipped: 0
stress-ng: info: [2237] passed: 1: vm (1)
stress-ng: info: [2237] failed: 0
stress-ng: info: [2237] metrics untrustworthy: 0
stress-ng: info: [2237] successful run completed in 1 min, 3.96 secs
run-u7.service: Deactivated successfully.
run-u7.service: Consumed 28.368s CPU time, no IP traffic.
该free
命令的输出将显示交换是否实际被使用。
大多数时候,之前的测试就足够了。不幸的是,当内核即将耗尽内存时,可能会创建内核无法使用的交换区。具体来说,如果少于min_free_kbytes
剩余的可用内存,那么内核将进入最小内存紧急模式,其中仅PF_MEMALLOC
允许分配。如果写入交换设备或交换文件需要非PF_MEMALLOC
内存分配,那么如果使用过多的 RAM,系统将会崩溃。
您可以通过以下方法测试达到限制是否min_free_kbytes
会破坏系统:
#!/usr/bin/env bash
# Again, remember to potentially adjust amount_of_ram and timeout.
amount_of_ram=1G
timeout=60
original_min_free_kbytes="$(sysctl -n vm.min_free_kbytes)"
sudo -v
{
sleep "$(( timeout / 2 ))"
free
sudo sysctl vm.min_free_kbytes="$(( 128 * 1024 ))"
} &
stress-ng \
--timeout "$timeout" \
--vm 1 \
--vm-hang 0 \
--vm-method zero-one \
--vm-bytes "$amount_of_ram" &
wait
sudo sysctl vm.min_free_kbytes="$original_min_free_kbytes"
如果您的系统正常,那么该脚本将成功退出。如果您的系统需要非PF_MEMALLOC
内存分配才能进行交换,那么这将会发生:
[ 1106.923468] INFO: task systemd:1 blocked for more than 122 seconds.
[ 1106.924018] Tainted: G W 6.8.1 #1-NixOS
[ 1106.924512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.925344] INFO: task kthreadd:2 blocked for more than 122 seconds.
[ 1106.925876] Tainted: G W 6.8.1 #1-NixOS
[ 1106.926356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.927188] INFO: task kworker/u2:0:11 blocked for more than 122 seconds.
[ 1106.927757] Tainted: G W 6.8.1 #1-NixOS
[ 1106.928234] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.929447] INFO: task kworker/u2:1:23 blocked for more than 122 seconds.
[ 1106.930018] Tainted: G W 6.8.1 #1-NixOS
[ 1106.930506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.931598] INFO: task kswapd0:37 blocked for more than 122 seconds.
[ 1106.932129] Tainted: G W 6.8.1 #1-NixOS
[ 1106.932619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.933396] INFO: task kworker/0:3:139 blocked for more than 122 seconds.
[ 1106.933968] Tainted: G W 6.8.1 #1-NixOS
[ 1106.934452] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.935430] INFO: task systemd-udevd:425 blocked for more than 122 seconds.
[ 1106.936051] Tainted: G W 6.8.1 #1-NixOS
[ 1106.936611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.937482] INFO: task systemd-oomd:578 blocked for more than 122 seconds.
[ 1106.938077] Tainted: G W 6.8.1 #1-NixOS
[ 1106.938582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.939438] INFO: task systemd-timesyn:605 blocked for more than 122 seconds.
[ 1106.940063] Tainted: G W 6.8.1 #1-NixOS
[ 1106.940572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1106.941436] INFO: task kworker/0:5:642 blocked for more than 122 seconds.
[ 1106.942028] Tainted: G W 6.8.1 #1-NixOS
[ 1106.942539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.