好的,我有一个 bash 函数,应用于多个文件夹:
function task(){
do_thing1
do_thing2
do_thing3
...
}
我想并行运行该函数。到目前为止,我正在使用一个小分叉技巧:
N=4 #core number
for temp_subj in ${raw_dir}/MRST*
do
((i=i%N)); ((i++==0)) && wait
task "$temp_subj" &
done
而且效果很好。但我决定使用“更干净”的东西并使用 GNU 并行:
ls -d ${raw_dir}/MRST* | parallel task {}
问题是它把所有东西都放在并行中,包括我的任务函数中的 do_thing 。它不可避免地会崩溃,因为这些必须以串行方式执行。我尝试以多种方式修改对并行的调用,但似乎没有任何效果。有任何想法吗?
答案1
我认为你的问题与do_thingX
:
do_thing() { echo Doing "$@"; sleep 1; echo Did "$@"; }
export -f do_thing
do_thing1() { do_thing 1 "$@"; }
do_thing2() { do_thing 2 "$@"; }
do_thing3() { do_thing 3 "$@"; }
# Yes you can name functions ... - it is a bit unconventional, but it works
...() { do_thing ... "$@"; }
export -f do_thing1
export -f do_thing2
export -f do_thing3
export -f ...
function task(){
do_thing1
do_thing2
do_thing3
...
}
export -f task
# This should take 4 seconds for a single input
ls ${raw_dir}/MRST* | time parallel task {}
或者您使用的parallel
不是 GNU Parallel。检查它是否与 GNU 并行:
$ parallel --version
GNU parallel 20201122
Copyright (C) 2007-2020 Ole Tange, http://ole.tange.dk and Free Software
Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
GNU parallel comes with no warranty.
Web site: https://www.gnu.org/software/parallel
When using programs that use GNU Parallel to process data for publication
please cite as described in 'parallel --citation'.