如何测试交换分区

如何测试交换分区

我正在尝试诊断无头服务器上的一些随机段错误,似乎奇怪的一件事是它们似乎只在内存压力下发生,并且我的交换大小不会超过 0。

如何强制我的机器交换以确保其正常工作?

orca ~ # free
             total       used       free     shared    buffers     cached
Mem:       1551140    1472392      78748          0     333920    1046368
-/+ buffers/cache:      92104    1459036
Swap:      1060280          0    1060280

orca ~ # swapon -s
Filename                                Type            Size    Used    Priority
/dev/sdb2                               partition       1060280 0       -1

答案1

这是Linux吗?如果是这样,您可以尝试以下操作:

# sysctl vm.swappiness=100

(您可能想sysctl vm.swappiness先使用它来查看默认值,在我的系统上是10

然后要么使用使用大量 RAM 的程序,要么编写一个仅消耗 RAM 的小应用程序。以下将做到这一点(来源:Linux 磁盘缓存的实验和乐趣):

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include <unistd.h>


int main(int argc, char** argv) {
    int max = -1;
    int mb = 0;
    int multiplier = 1; // allocate 1 MB every time unit. Increase this to e.g.100 to allocate 100 MB every time unit.
    char* buffer;

    if(argc > 1)
        max = atoi(argv[1]);

    while((buffer=malloc(multiplier * 1024*1024)) != NULL && mb != max) {
        memset(buffer, 1, multiplier * 1024*1024);
        mb++;
        printf("Allocated %d MB\n", multiplier * mb);
        sleep(1); // time unit: 1 second
    }      
    return 0;
}

对 memset 行进行编码,以使用 1 而不是 0 来初始化块,因为 Linux 虚拟内存管理器可能足够聪明,不会实际分配任何 RAM。我添加了 sleep(1) ,以便让您有更多时间观察进程,因为它吞噬了内存和交换。一旦你没有足够的 RAM 和 SWAP 来提供给程序,OOM 杀手就会杀死它。你可以用以下命令编译它

gcc filename.c -o memeater

其中 filename.c 是您保存上述程序的文件。然后您可以使用 ./memeater 运行它。

我不会在生产机器上这样做。

答案2

为了运行本文中的测试,您需要以下内容:

对于第一个测试,您需要确保在正常情况下可以正常读取和写入交换分区。您可以通过运行这些命令来执行此操作。不要忘记更改amount_of_swap为您拥有的实际交换金额。timeout如果您的交换特别慢或特别大,您可能还需要增加。

$ amount_of_swap=2G
$ timeout=60
$ systemd-run --property="MemoryHigh=128M" -- \
    stress-ng \
        --timeout "$timeout" \
        --vm 1 \
        --vm-hang 0 \
        --vm-method zero-one \
        --vm-bytes "$amount_of_swap"
Running as unit: run-u7.service
$ # Wait for it to start using swap, then run:
$ free
               total        used        free      shared  buff/cache   available
Mem:          479432      345384       19136        3284      114912      117948
Swap:        2097148     1975096      122052
$ # Make sure that stress-ng exited successfully:
$ unit_name=run-u7.service  # This might be different on your system. See systemd-run’s output.
$ journalctl --boot --unit="$unit_name"
Started /nix/store/fmsawx6292lg2mc96hj5gmql1mk973dz-stress-ng-0.17.01/bin/stress-ng --timeout 60 --vm 1 --vm-hang 0 --vm-method zero-one --vm-bytes 2G.
invoked with '/nix/store/fmsawx6292lg2mc96hj5gmql1mk973dz-stress-ng-0.17.01/bin/stress-ng --timeout 60 --vm 1 --vm-hang 0 --vm-method zero-one --vm-bytes 2G' by user 0 'root'
stress-ng: info:  [2237] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [2237] dispatching hogs: 1 vm
system: 'jasonyundt' Linux 6.8.1 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar 15 18:19:29 UTC 2024 x86_64
memory (MB): total 468.20, free 127.03, shared 3.21, buffer 1.59, swap 2048.00, free swap 2046.73
stress-ng: info:  [2237] skipped: 0
stress-ng: info:  [2237] passed: 1: vm (1)
stress-ng: info:  [2237] failed: 0
stress-ng: info:  [2237] metrics untrustworthy: 0
stress-ng: info:  [2237] successful run completed in 1 min, 3.96 secs
run-u7.service: Deactivated successfully.
run-u7.service: Consumed 28.368s CPU time, no IP traffic.

free命令的输出将显示交换是否实际被使用。


大多数时候,之前的测试就足够了。不幸的是,当内核即将耗尽内存时,可能会创建内核无法使用的交换区。具体来说,如果少于min_free_kbytes剩余的可用内存,那么内核将进入最小内存紧急模式,其中仅PF_MEMALLOC允许分配。如果写入交换设备或交换文件需要非PF_MEMALLOC内存分配,那么如果使用过多的 RAM,系统将会崩溃。

您可以通过以下方法测试达到限制是否min_free_kbytes会破坏系统:

#!/usr/bin/env bash
# Again, remember to potentially adjust amount_of_ram and timeout.
amount_of_ram=1G
timeout=60

original_min_free_kbytes="$(sysctl -n vm.min_free_kbytes)"
sudo -v
{
    sleep "$(( timeout / 2 ))"
    free
    sudo sysctl vm.min_free_kbytes="$(( 128 * 1024 ))"
} &
stress-ng \
    --timeout "$timeout" \
    --vm 1 \
    --vm-hang 0 \
    --vm-method zero-one \
    --vm-bytes "$amount_of_ram" &
wait

sudo sysctl vm.min_free_kbytes="$original_min_free_kbytes"

如果您的系统正常,那么该脚本将成功退出。如果您的系统需要非PF_MEMALLOC内存分配才能进行交换,那么将会发生:

[ 1106.923468] INFO: task systemd:1 blocked for more than 122 seconds.
[ 1106.924018]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.924512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.925344] INFO: task kthreadd:2 blocked for more than 122 seconds.
[ 1106.925876]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.926356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.927188] INFO: task kworker/u2:0:11 blocked for more than 122 seconds.
[ 1106.927757]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.928234] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.929447] INFO: task kworker/u2:1:23 blocked for more than 122 seconds.
[ 1106.930018]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.930506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.931598] INFO: task kswapd0:37 blocked for more than 122 seconds.
[ 1106.932129]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.932619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.933396] INFO: task kworker/0:3:139 blocked for more than 122 seconds.
[ 1106.933968]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.934452] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.935430] INFO: task systemd-udevd:425 blocked for more than 122 seconds.
[ 1106.936051]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.936611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.937482] INFO: task systemd-oomd:578 blocked for more than 122 seconds.
[ 1106.938077]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.938582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.939438] INFO: task systemd-timesyn:605 blocked for more than 122 seconds.
[ 1106.940063]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.940572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1106.941436] INFO: task kworker/0:5:642 blocked for more than 122 seconds.
[ 1106.942028]       Tainted: G        W          6.8.1 #1-NixOS
[ 1106.942539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.

相关内容