# cat /etc/sysctl.conf
fs.aio-max-nr=99999999
fs.file-max=99999999
kernel.pid_max=4194304
kernel.threads-max=99999999
kernel.sem=32768 1073741824 2000 32768
kernel.shmmni=32768
kernel.msgmni=32768
kernel.msgmax=65536
kernel.msgmnb=65536
vm.max_map_count=1048576
# cat /etc/security/limits.conf
* soft core unlimited
* hard core unlimited
* soft data unlimited
* hard data unlimited
* soft fsize unlimited
* hard fsize unlimited
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 1048576
* hard nofile 1048576
* soft rss unlimited
* hard rss unlimited
* soft stack unlimited
* hard stack unlimited
* soft cpu unlimited
* hard cpu unlimited
* soft nproc unlimited
* hard nproc unlimited
* soft as unlimited
* hard as unlimited
* soft maxlogins unlimited
* hard maxlogins unlimited
* soft maxsyslogins unlimited
* hard maxsyslogins unlimited
* soft locks unlimited
* hard locks unlimited
* soft sigpending unlimited
* hard sigpending unlimited
* soft msgqueue unlimited
* hard msgqueue unlimited
# cat /etc/systemd/logind.conf
[Login]
UserTasksMax=infinity
# free -g
total used free shared buff/cache available
Mem: 117 5 44 62 67 48
Swap: 15 8 7
# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 194G 121G 74G 63% /
# cat /proc/meminfo
MemTotal: 123665416 kB
MemFree: 90979152 kB
MemAvailable: 95376636 kB
Buffers: 72260 kB
Cached: 25964076 kB
SwapCached: 0 kB
Active: 8706568 kB
Inactive: 22983044 kB
Active(anon): 7568968 kB
Inactive(anon): 18871224 kB
Active(file): 1137600 kB
Inactive(file): 4111820 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 16777212 kB
SwapFree: 16777212 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 5653128 kB
Mapped: 185100 kB
Shmem: 20786924 kB
KReclaimable: 281732 kB
Slab: 541000 kB
SReclaimable: 281732 kB
SUnreclaim: 259268 kB
KernelStack: 34384 kB
PageTables: 93216 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 78609920 kB
Committed_AS: 63750908 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 46584 kB
VmallocChunk: 0 kB
Percpu: 18944 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 183484 kB
DirectMap2M: 5058560 kB
DirectMap1G: 122683392 kB
And for the user account used to run the scripts:
$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
然而
./somescript.sh: fork: retry: Resource temporarily unavailable
服务器负载中等(目前平均负载约为 20),使用许多执行大量分叉的脚本(即$(comecode)
在许多脚本内)。服务器(Google 云实例)有 16 个核心和 128GB 内存,以及 100GB tmpfs 驱动器和 16GB 交换空间。即使 CPU、内存和交换空间的使用率都低于 50%,也会显示该消息。
很难相信它会达到这些已经很高的上限。我怀疑还有其他设置会影响这一点。
还可以进行哪些调整以避免此fork: retry: Resource temporarily unavailable
问题?
答案1
经过更多调试,我终于找到了答案。这个答案似乎很有价值,因为其他人可能会遇到这种情况。这也可能是 Ubuntu 中的一个错误(TBD)
我的脚本在各个地方做了以下更改(在脚本中);
ulimit -u 20000 2>/dev/null
20000
根据剧本/情况,该数字将从 2000 到 40000 不等。
因此,似乎会发生这样的情况:一旦一些进程以某种方式“达到”打开文件的最大总数(1048576)——这似乎很容易做到,例如,只有有限数量的脚本——每次乘以它们各自的 ulimit 设置。结果是最多会启动大约 2000-2200 个线程。
我删除了所有ulimit -u
设置,现在不是不再出现任何问题fork: retry: resource temporarily unavailable
,也不会出现任何其他相关的分叉错误。
htop 现在还显示超过 2000-2200 个线程;
Tasks: 2349, 22334 thr, 318 kthr; 32 running
现在我的机器变得超载/不响应,但这是另一个问题(服务器可能正在交换),而且比这个fork
问题更令人愉快:)
(作为一个有趣的旁注和参考,https://stackoverflow.com/questions/30757919/the-limit-of-ulimit-hn描述如何将最大打开文件数增加到大于 1048576 的数量。)
为此设置测试应该很容易(bash 嵌套 fork 脚本,ulimit -n ${some_large_value}
每个分叉线程内都有一个集合)