我正在运行数千个卷曲在以下 bash 脚本中并行后台进程
START=$(date +%s)
for i in {1..100000}
do
curl -s "http://some_url_here/"$i > $i.txt&
END=$(date +%s)
DIFF=$(( $END - $START ))
echo "It took $DIFF seconds"
done
我有 49Gb Corei7-920 专用服务器(非虚拟)。
我通过命令跟踪内存消耗和 CPU top
,它们远离界限。
我用来ps aux | grep curl | wc -l
计算当前的数量卷曲流程。这个数字迅速增加到 2-4 千,然后开始不断减少。
如果我通过管道curl向awk(curl | awk > output
)添加简单的解析,那么curl进程数只会增加到1-2千,然后减少到20-30...
为什么进程数量大幅减少?这种架构的边界在哪里?
答案1
严格遵循问题:
mycurl() {
START=$(date +%s)
curl -s "http://some_url_here/"$1 > $1.txt
END=$(date +%s)
DIFF=$(( $END - $START ))
echo "It took $DIFF seconds"
}
export -f mycurl
seq 100000 | parallel -j0 mycurl
如果您不需要时间周围的样板文本,则较短:
seq 100000 | parallel -j0 --joblog log curl -s http://some_url_here/{} ">" {}.txt
cut -f 4 log
如果您想并行运行 1000 秒,您将遇到一些限制(例如文件句柄)。提高 ulimit -n 或 /etc/security/limits.conf 可能会有所帮助。
答案2
for i in {1..100000}
只有65536个端口。节流这个。
for n in {1..100000..1000}; do # start 100 fetch loops
for i in `eval echo {$n..$((n+999))}`; do
echo "club $i..."
curl -s "http://some_url_here/"$i > $i.txt
done &
wait
done
(编辑:( 编辑:删除有关操作系统限制的严重过时的断言并添加缺失的)echo
curl
wait
答案3
尝试使用这个 shellscript 进行批量并发curl
请求:
猫curpool.sh
#!/bin/bash
# usage: ./curlpool.sh "-d '{ \"id\": 1, \"jsonrpc\": \"2.0\", \"method\": \"cfx_getParamsFromVote\", \"params\": []}' -H \"Content-Type: application/json\" -X POST http://127.0.0.1:22537"
target=${1:-http://example.com}
cmd="curl $target"
concurrency=${2:-20}
while true # loop forever, until ctrl+c pressed.
do
for i in $(seq $concurrency) # perfrom the inner command 100 times.
do
eval $cmd & # send out a curl request, the & indicates not to wait for the response.
# or use `eval $cmd > /dev/null &` if you don't want see the output.
done
wait # after all requests are sent out, wait for their processes to finish before the next iteration.
done