如何让系统等待脚本内的脚本完成其后台进程

如何让系统等待脚本内的脚本完成其后台进程

我正在为一组数据构建管道,主要部分是这样的

#! /bin/bash

time bwa mem -o bwa/mem/Stettler -M -t 96 -R "@RG\tID:Test\tSM:Stettler\tLB:TestLib\tPL:ILLUMINA" /storage/ppl/wentao/bwa_Index/genome.fa $1 $2
wait
echo "finished mem"
samtools view -Sb -@ 96 -o samtools/Stettler.bam bwa/mem/Stettler
wait
echo  "got stettler"
wait
time samtools sort -@ 96 -O bam -o samtools/sort/approachAsortedstettler.bam samtools/Stettler.bam
wait
echo "sorted"

time samtools index samtools/sort/approachAsortedstettler.bam
wait
echo "finished indexing"

time gatk MarkDuplicates -I samtools/sort/approachAsortedstettler.bam -O GATK/MarkDuplicates/ApproachAsortedstettler.bam -M GATK/MarkDuplicates/metrics/ApproachB
wait
echo "Marked Duplicates"
time samtools index GATK/MarkDuplicates/ApproachAsortedstettler.bam
wait
echo "indexed again ++++++++++++++++++++++++++++++++++++++++"
time bash scripts/Parallelhaplo.sh
wait
echo "Parallelhaplo"

time bash scripts/MergerHAplo.sh
wait
echo "merged"
time vcftools --vcf GATK/MergedSample_gather.raw.vcf --min-meanDP  $3 --recode --out vcftools/MergedGATKdp2.vcf
wait
echo "deep checked"
time gatk IndexFeatureFile --feature-file vcftools/MergedGATKdp2.vcf.recode.vcf
wait
echo "IFF"
time gatk SelectVariants -R /storage/ppl/wentao/GATK_R_index/genome.fa --variant vcftools/MergedGATKdp2.vcf.recode.vcf --concordance vcftools/Mergedmpileupdp2.vcf.recode.vcf -O GATK/SelectVariants/Common$
wait
echo "finished"

称为并行 Haplo 的过程如下所示

#!/bin/bash
#parallel call SNPs with chromosomes by GATK

for i in 1 2 3 4 5 6 7;do for o in A B D;do for u in _part1 _part2;do (gatk
 HaplotypeCaller -R /storage/ppl/wentao/GATK_R_index/genome.fa -I 
GATK/MarkDuplicates/ApproachAsortedstettler.bam -L chr$i$o$u -O 
GATK/HaplotypeCaller/HaploSample.chr$i$o$u.raw.vcf &);done;done ; done 

gatk HaplotypeCaller -R /storage/ppl/wentao/GATK_R_index/genome.fa -I 
GATK/MarkDuplicates/ApproachBsortedstettler.bam -L chrUn -O 
GATK/HaplotypeCaller/HaploSample.chrUn.raw.vcf&

wait

echo "parallel call finished"

wait

然而,当我执行脚本时,通常会发生的情况是 ParallelHaplo 已启动,但由于某种原因,对两个脚本中任何一个的等待都不会等待它完成,因此它会进入下一步,并且由于下一步无法找到文件我只是收到错误。那我能做什么呢?

答案1

问题是您正在将 gatk 进程发送到子 shell 内的后台:( gatk ... & )。后台进程不是该子 shell 的子进程,而不是脚本 shell 的子进程,因此wait不会看到它,也不会等待它。从help wait

wait: wait [-fn] [id ...]
    Wait for job completion and return exit status.

    Waits for each process identified by an ID, which may be a process ID or a
    job specification, and reports its termination status.  If ID is not
    given, waits for all currently active child processes, and the return
    status is zero.  If ID is a job specification, waits for all processes
    in that job's pipeline.

如果您将其更改为整个子 shell 的背景(即( gatk ... ) &,更好的是,根本不使用子 shell,因为它在这里没有做任何有用的事情,它将按预期工作:

for i in 1 2 3 4 5 6 7; do
  for o in A B D; do
    for u in _part1 _part2; do
      gatk HaplotypeCaller \
           -R /storage/ppl/wentao/GATK_R_index/genome.fa \
           -I GATK/MarkDuplicates/ApproachAsortedstettler.bam \
           -L chr$i$o$u \
           -O GATK/HaplotypeCaller/HaploSample.chr$i$o$u.raw.vcf &
    done
  done
done 

相关内容