bash fork:重试:资源暂时不可用

bash fork:重试:资源暂时不可用

我正在尝试运行一个 shell 脚本,它将使用 shell 脚本创建进程。我收到资源暂时不可用错误。如何确定哪个限制(内存/进程/文件数)造成了此问题。以下是我的ulimit -a结果。

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 563959
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 10000000
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

答案1

对于评论中的情况,您没有在每个线程中使用太多内存,但您已经达到了 cgroup 限制。您会发现默认值约为 12288,但该值是可写的:

$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
12288
$ echo 15000 | sudo tee /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
15000
$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
15000

编辑:对于较新版本的 Ubuntu(即 24.04),位置已更改:

$ cat /sys/fs/cgroup/user.slice/user-1000.slice/pids.max
20668 

如果我使用我的“线程限制是什么”程序(发现这里) 检查之前:

$ ./thread-limit
Creating threads ...
100 threads so far ...
200 threads so far ...
...
12100 threads so far ...
12200 threads so far ...
Failed with return code 11 creating thread 12281 (Resource temporarily unavailable).
Malloc worked, hmmm

之后:

$ ./thread-limit
Creating threads ...
100 threads so far ...
200 threads so far ...
300 threads so far ...
...
14700 threads so far ...
14800 threads so far ...
14900 threads so far ...
Failed with return code 11 creating thread 14993 (Resource temporarily unavailable).
Malloc worked, hmmm

当然,上面的数字并不准确,因为“doug”用户还有其他几个线程在运行,比如我与服务器的 SSH 会话。检查一下:

$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
8

编辑:对于较新版本的 Ubuntu(即 24.04),位置已更改:

$ cat /sys/fs/cgroup/user.slice/user-1000.slice/pids.current
12

所用程序:

/* compile with:   gcc -pthread -o thread-limit thread-limit.c */
/* originally from: http://www.volano.com/linuxnotes.html */

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <string.h>

#define MAX_THREADS 100000
#define PTHREAD_STACK_MIN 1*1024*1024*1024
int i;

void run(void) {
  sleep(60 * 60);
}

int main(int argc, char *argv[]) {
  int rc = 0;
  pthread_t thread[MAX_THREADS];
  pthread_attr_t thread_attr;

  pthread_attr_init(&thread_attr);
  pthread_attr_setstacksize(&thread_attr, PTHREAD_STACK_MIN);

  printf("Creating threads ...\n");
  for (i = 0; i < MAX_THREADS && rc == 0; i++) {
    rc = pthread_create(&(thread[i]), &thread_attr, (void *) &run, NULL);
    if (rc == 0) {
      pthread_detach(thread[i]);
      if ((i + 1) % 100 == 0)
    printf("%i threads so far ...\n", i + 1);
    }
    else
    {
      printf("Failed with return code %i creating thread %i (%s).\n",
         rc, i + 1, strerror(rc));

      // can we allocate memory?
      char *block = NULL;
      block = malloc(65545);
      if(block == NULL)
        printf("Malloc failed too :( \n");
      else
        printf("Malloc worked, hmmm\n");
    }
  }
sleep(60*60); // ctrl+c to exit; makes it easier to see mem use
  exit(0);
}

也可以看看这里

编辑于 2020 年 5 月:对于较新版本的 Ubuntu,默认最大 PID 号现在为 4194304,因此无需对其进行调整。

现在,如果您有足够的内存,下一个限制将由默认的最大 PID 编号定义,即 32768,但也是可写的。显然,为了同时拥有超过 32768 个进程、任务或线程,它们的 PID 必须允许更高:

$ cat /proc/sys/kernel/pid_max
32768
$ echo 80000 | sudo tee /proc/sys/kernel/pid_max
80000
$ cat /proc/sys/kernel/pid_max
80000

请注意,我们故意选择大于 2**16 的数字,以查看是否真的允许。现在,将 cgroup 最大值设置为 70000:

$ echo 70000 | sudo tee /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
70000
$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
70000

此时,请注意,即使资源仍然可用,上面列出的程序似乎也有大约 32768 个线程的限制,因此请使用另一种方法。我的测试服务器有 16 GB 的内存,似乎在大约 62344 个任务时耗尽了其他资源,尽管似乎仍有可用内存。

$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
62344

顶部:

top - 13:48:26 up 21:08,  4 users,  load average: 281.52, 134.90, 70.93
Tasks: 62535 total, 201 running, 62334 sleeping,   0 stopped,   0 zombie
%Cpu0  : 96.6 us,  2.4 sy,  0.0 ni,  1.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 95.7 us,  2.4 sy,  0.0 ni,  1.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 95.1 us,  3.1 sy,  0.0 ni,  1.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 93.5 us,  4.6 sy,  0.0 ni,  1.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  : 94.8 us,  3.4 sy,  0.0 ni,  1.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 95.5 us,  2.6 sy,  0.0 ni,  1.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 94.7 us,  3.5 sy,  0.0 ni,  1.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 93.8 us,  4.5 sy,  0.0 ni,  1.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 15999116 total,   758684 free, 10344908 used,  4895524 buff/cache
KiB Swap: 16472060 total, 16470396 free,     1664 used.  4031160 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
37884 doug      20   0  108052  68920   3104 R   5.7  0.4   1:23.08 top
24075 doug      20   0    4360    652    576 S   0.4  0.0   0:00.31 consume
26006 doug      20   0    4360    796    720 S   0.4  0.0   0:00.09 consume
30062 doug      20   0    4360    732    656 S   0.4  0.0   0:00.17 consume
21009 doug      20   0    4360    748    672 S   0.3  0.0   0:00.26 consume

看来我最终达到了用户进程和计时器(信号)数量默认的 ulimit 设置:

$ ulimit -i
62340
doug@s15:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 62340
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 62340
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

如果我提高这些限制,就我而言,我是通过以下方式实现的/etc/security/limits.conf

# /etc/security/limits.conf
#
# S18 specific edits. 2019.12.24
#       also for a ridiculous number of threads test.
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#        - NOTE: group and wildcard limits are not applied to root.
#          To apply a limit to the root user, <domain> must be
#          the literal username root.
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
* - nofile 32768
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
* - nproc 200000
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
* - sigpending 200000
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#        - chroot - change root to directory (Debian-specific)
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

# End of file

在无法分叉之前,我可以达到 126020 个线程。这次的限制是(请记住,在测试开始之前,此服务器上大约有 150 个 root 拥有的线程):

cat /proc/sys/kernel/threads-max
126189

好的,现在调整该参数:

echo 99999999 | sudo tee /proc/sys/kernel/threads-max
99999999

在我的 16 GB 的服务器开始交换内存并出现故障之前,我可以达到大约 132,000 个线程。

$ cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
132016

注意:在这些条件下,运行 top 会给系统带来很大的额外负载,所以我没有运行它。但是内存:

doug@s18:~/config/etc/security$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15859       15509         270           1          79         137
Swap:          2047           4        2043

在某些时候你会遇到麻烦,但系统停滞的速度之快绝对令人惊叹。一旦我的系统开始交换,它就会完全停滞,并且我遇到了许多以下错误:

Feb 17 16:13:02 s15 kernel: [  967.907305] INFO: task waiter:119371 blocked for more than 120 seconds.
Feb 17 16:13:02 s15 kernel: [  967.907335]       Not tainted 4.10.0-rc8-stock #194
Feb 17 16:13:02 s15 kernel: [  967.907357] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

我的平均负载膨胀到约 29000。但我把电脑放了一个小时,它就自己解决了。我把线程的旋转间隔错开 200 微秒,这似乎也有帮助。

相关内容