在bash脚本中并行运行数千个curl后台进程

在bash脚本中并行运行数千个curl后台进程

我正在运行数千个卷曲在以下 bash 脚本中并行后台进程

START=$(date +%s)
for i in {1..100000}
do       
    curl -s "http://some_url_here/"$i  > $i.txt&
    END=$(date +%s)
    DIFF=$(( $END - $START ))
    echo "It took $DIFF seconds"
done

我有 49Gb Corei7-920 专用服务器(非虚拟)。

我通过命令跟踪内存消耗和 CPU top,它们远离界限。

我用来ps aux | grep curl | wc -l计算当前的数量卷曲流程。这个数字迅速增加到 2-4 千,然后开始不断减少。

如果我通过管道curl向awk(curl | awk > output)添加简单的解析,那么curl进程数只会增加到1-2千,然后减少到20-30...

为什么进程数量大幅减少?这种架构的边界在哪里?

答案1

严格遵循问题:

mycurl() {
    START=$(date +%s)
    curl -s "http://some_url_here/"$1  > $1.txt
    END=$(date +%s)
    DIFF=$(( $END - $START ))
    echo "It took $DIFF seconds"
}
export -f mycurl

seq 100000 | parallel -j0 mycurl

如果您不需要时间周围的样板文本,则较短:

seq 100000 | parallel -j0 --joblog log curl -s http://some_url_here/{} ">" {}.txt
cut -f 4 log

如果您想并行运行 1000 秒,您将遇到一些限制(例如文件句柄)。提高 ulimit -n 或 /etc/security/limits.conf 可能会有所帮助。

答案2

for i in {1..100000}

只有65536个端口。节流这个。

for n in {1..100000..1000}; do   # start 100 fetch loops
        for i in `eval echo {$n..$((n+999))}`; do
                echo "club $i..."
                curl -s "http://some_url_here/"$i  > $i.txt
        done &
        wait
done

(编辑:( 编辑:删除有关操作系统限制的严重过时的断言并添加缺失的)echocurl
wait

答案3

尝试使用这个 shellscript 进行批量并发curl请求:

猫curpool.sh

#!/bin/bash
# usage: ./curlpool.sh "-d '{ \"id\": 1, \"jsonrpc\": \"2.0\",  \"method\": \"cfx_getParamsFromVote\", \"params\": []}' -H \"Content-Type: application/json\" -X POST http://127.0.0.1:22537"

target=${1:-http://example.com}
cmd="curl $target"
concurrency=${2:-20}

while true # loop forever, until ctrl+c pressed.
do
    for i in $(seq $concurrency) # perfrom the inner command 100 times.
    do
        eval $cmd & # send out a curl request, the & indicates not to wait for the response.
        # or use `eval $cmd  > /dev/null &` if you don't want see the output.
    done

    wait # after all requests are sent out, wait for their processes to finish before the next iteration.
done

相关内容