我在 bash 里面有一个脚本,它定义为使用 perl 和 python 作为需要分析我的数据的子程序。
脚本执行正常,直到步骤 3 和步骤 4,然后崩溃command not found
。我尝试了chmod u+x
那些 .pl .py 脚本;我尝试明确定义 perl 和 python 的路径,但没有任何效果。脚本与我正在执行的 .sh 脚本位于同一目录中。我遗漏了一些东西,但我不知道是什么。
这是我在步骤 3 和步骤 4 中遇到的错误:
这是步骤 3 和 4 中需要使用的 bash 脚本和 .pl 和 .py 脚本。
#!/bin/bash
if [ $# -lt 1 ]
then
echo "Please type $0 -h for helps"
exit 65
fi
FORMAT="fastq"
REF_PATH="/home/maja2004lts/16sPIP/"
MODE="fast"
THREAD=8
Version="0.1.1"
step="step1"
THREAD=8
while getopts ":i:r:f:s:p:m:t:hv" opt
do
case "${opt}" in
i) NGS=${OPTARG}
HELP=0
;;
r) NGS_R2=${OPTARG}
#echo "${OPTARG}"
;;
f) FORMAT=${OPTARG}
;;
h) HELP=1
#echo "HELP IS $HELP"
;;
s) STEP=${OPTARG}
;;
v) VERIFICATION=1
;;
p) REF_PATH=${OPTARG} #reference DB path
;;
m) MODE=${OPTARG} #fast or sensitive
;;
t) THREAD=${OPTARG}
;;
?) echo "Option -${opt} requires an argument. Please type $0 -h for helps" >&2
exit 1
;;
esac
done
if [ $HELP -eq 1 ]
then
cat <<USAGE
16sPIP version ${Version}
This program will run the 16sPIP pipeline with the supplied parameters.
Command Line Switches:
-h Show this help & ignore all other switches
-i Specify NGS file or pair-end forward reads for processing
-r Specify pair-end reverse reads for processing
-f Specify NGS file format (fastq/fasta/bam/sam/sff) [fastq]
16sPIP will support more NGS file formats in the future
-p Specify the PATH for database (DB)
16sPIP will search the reference DB under the Path provided.
fast_16S_nucl_DB
sensitive_nucl_DB
-v Verification mode
-s jump to that step [1]
-m Specify the analysis mode: fast or sensitive [fast]
-t number of threads [1]
Usage:
$0 -i <NGSfile> -f <fastq/fasta/bam/sam/sff> -p <reference_path>
$0 -i <forward> -r <reverse> -f <fastq> -p <reference_path> -m <fast> -s <step>
USAGE
exit
fi
if [ ! -f $NGS ]
then
echo "$NGS file doesnot exist. Please check it"
exit 65
fi
if [ $NGS_R2 -a "${FORMAT}" != "fastq" ]
then
echo "$NGS_R2 must be in fastq format"
exit
fi
if [ "$MODE"x != "fast"x -a "$MODE"x != "sensitive"x ]
then
echo "Specify the analysis mode: fast or sensitive [fast]"
exit
fi
if [ "${FORMAT}" = "sam" -o "${FORMAT}" = "bam" ]
then
picard-tools SamToFastq INPUT=$NGS FASTQ=${NGS}.tmp
mv ${NGS}.tmp $NGS
elif [ "${FORMAT}" = "sff" ]
then
python ${REF_PATH}/bin/sff_extract $NGS >${NGS}.fastq
mv ${NGS}.fastq $NGS
fi
if [ "$step" != "" ]
then
goto ${step}
fi
step1:
echo "Step 1: Quality control "
if [ "${FORMAT}" = "fastq" -o "${FORMAT}" = "sam" -o "${FORMAT}" = "bam" -o "${FORMAT}" = "sff" ]
then
if [ ${NGS_R2} -a -f ${NGS_R2} ]
then
/usr/bin/perl ${REF_PATH}/bin/TrimmingReads.pl -i ${NGS} -irev ${NGS_R2} -q 20 -n 50
else
/usr/bin/perl ${REF_PATH}/bin/TrimmingReads.pl -i $NGS -q 20 -n 50
fi
elif [ "${FORMAT}" = "fasta" ]
then
/usr/bin/perl ${REF_PATH}/bin/TrimmingReads.pl -i $NGS -n 50
else
echo "Specify NGS file format (fastq/fasta/bam/sam/sff) [fastq]"
exit
fi
if [ $NGS_R2 -a -f $NGS_R2 ]
then
goto step2
else
goto step3
fi
step2:
echo "Step 2: Merge double-ended reads"
${REF_PATH}/bin/pear -f ${NGS}_trimmed -r ${NGS_R2}_trimmed -o $NGS -q 20 -t 50
mv ${NGS}.assembled.fastq ${NGS}_trimmed
rm ${NGS_R2}_trimmed
step3:
echo "Step 3: sequence filtering"
if [ "$FORMAT" = "fasta" ]
then
perl ${REF_PATH}/bin/FilterReads.pl ${NGS}_trimmed fasta 10
else
perl ${REF_PATH}/bin/FilterReads.pl ${NGS}_trimmed fastq 10
fi
step4:
echo "Step 4: Results report"
if [ "$FORMAT" = "fasta" ]
then
python ${REF_PATH}/bin/basicStatistics.py ${NGS}_trimmed_filter fasta ${NGS}.basic_stat.txt
else
python ${REF_PATH}/bin/basicStatistics.py ${NGS}_trimmed_filter fastq ${NGS}.basic_stat.txt
fi
if [ "$MODE" = "fast" ]
then
bwa mem -t ${THREAD} ${REF_PATH}/db/155pathogens.fa ${NGS}_trimmed_filter > ${NGS}.sam
/usr/bin/perl ${REF_PATH}/bin/pathogenSamMatch.pl $NGS.sam ${NGS}.pathon.match.txt
cat ${NGS}.pathon.match.txt >> ${NGS}.basic_stat.txt
rm ${NGS}.pathon.match.txt
/usr/bin/perl ${REF_PATH}/bin/SamSingleResult.pl -i $NGS.sam -s 99 -l species -o $NGS.pathogen
/usr/bin/perl ${REF_PATH}/bin/PathogenAnnotation.pl -i $NGS.pathogen -l ${REF_PATH}/db/155pathogens.list -o $NGS.pathogen.list
/usr/bin/perl ${REF_PATH}/bin/FastReport.pl -l $NGS.pathogen.list -s ${NGS}.basic_stat.txt -o ${NGS}.pathogen.prediction.report
elif [ "$MODE" = "sensitive" ]
then
bwa mem -t ${THREAD} ${REF_PATH}/db/16S-complete.fa ${NGS}_trimmed_filter > $NGS.com.sam &
echo $! >Job_id
bwa mem -t ${THREAD} ${REF_PATH}/db/155pathogens.fa ${NGS}_trimmed_filter > $NGS.sam
/usr/bin/perl ${REF_PATH}/bin/SamSingleResult.pl -i $NGS.sam -l species -s 99 -o $NGS.pathogen
/usr/bin/perl ${REF_PATH}/bin/PathogenAnnotation.pl -i $NGS.pathogen -l ${REF_PATH}/db/155pathogens.list -o $NGS.pathogen.list
if [ "$FORMAT" = "fasta" ]
then
/usr/bin/perl ${REF_PATH}/bin/extractSequence.pl -i ${NGS}_trimmed_filter -o ${NGS} -l $NGS.pathogen.list -f fasta
else
/usr/bin/perl ${REF_PATH}/bin/extractSequence.pl -i ${NGS}_trimmed_filter -o ${NGS} -l $NGS.pathogen.list -f fastq
fi
for i in `ls ${NGS}.*.pathon.fa`
do
${REF_PATH}/bin/blastn -query $i -out $i.blast -db ${REF_PATH}/db/16S-complete -outfmt 6 -evalue 1E-20 -num_threads ${THREAD} &
echo $! >>Job_id
done
/usr/bin/perl ${REF_PATH}/bin/pathogenSamMatch.pl $NGS.sam ${NGS}.pathon.match.txt
cat ${NGS}.pathon.match.txt >> ${NGS}.basic_stat.txt
while [ 1 ]
do
tag=1
for i in `cat Job_id`
do
ID=`ps |grep " $i "`
if [ "$ID" != "" ]
then
tag=0
sleep 10
fi
done
if [ $tag == 1 ]
then
break
fi
done
/usr/bin/perl ${REF_PATH}/bin/SamSingleResult.pl -i $NGS.com.sam -s 99 -l species
/usr/bin/perl ${REF_PATH}/bin/SamSingleResult.pl -i $NGS.com.sam -s 97 -l genus
/usr/bin/perl ${REF_PATH}/bin/SamSingleResult.pl -i $NGS.com.sam -s 95 -l family
/usr/bin/perl ${REF_PATH}/bin/completeAnnotation.pl $NGS.com.sam.species ${REF_PATH}/db/16S-complete.list species $NGS.com.sam.species.list
/usr/bin/perl ${REF_PATH}/bin/completeAnnotation.pl $NGS.com.sam.genus ${REF_PATH}/db/16S-complete.list genus $NGS.com.sam.genus.list
/usr/bin/perl ${REF_PATH}/bin/completeAnnotation.pl $NGS.com.sam.family ${REF_PATH}/db/16S-complete.list family $NGS.com.sam.family.list
cat ${NGS}.*.pathon.fa.blast >$NGS.tmp.blast
rm ${NGS}.*.pathon.fa*
/usr/bin/perl ${REF_PATH}/bin/completeAnnotation.pl $NGS.tmp.blast ${REF_PATH}/db/16S-complete.list species $NGS.tmp.blast.list.species
/usr/bin/perl ${REF_PATH}/bin/sensitiveReport.pl -l $NGS.pathogen.list -t ${NGS}.basic_stat.txt -g $NGS.com.sam.genus.list -f $NGS.com.sam.family.list -b $NGS.tmp.blast.list.species -s $NGS.com.sam.species.list -o ${NGS}.pathogen.prediction.report
fi
enscript -p ${NGS}.pathogen.prediction.report.ps ${NGS}.pathogen.prediction.report
ps2pdf ${NGS}.pathogen.prediction.report.ps ${NGS}.pathogen.prediction.report.pdf
--------------perl script-----------
#!/usr/bin/perl
#
# FilterReads.pl
# Read filter procedures were based on the read ambiguous base (N) content.
#
# FilterReads.pl <seq> <fasta|fastq> <N_percentage>
#
### Authors : JiaojiaoMiao <[email protected]>
use strict;
use warnings;
if($#ARGV<1){
die "usage: $0 <seq> <fasta|fastq> <N_percentage>\n";
}
my $f=$ARGV[0];
my $ncutoff;
my $format=$ARGV[1];
if(defined($ARGV[2])){
$ncutoff=$ARGV[2];
}
else{
$ncutoff=10;
}
my $o=$f."_filter";
open(F,"<$f")or die "can not open file: $f\n";
open(O,">$o")or die "can not open file: $o\n";
if($format=~/fastq/i){
while(my $l=<F>){
my $l2=<F>;
my $l3=<F>;
my $l4=<F>;
my $length=length($l2);
my $ncount=$l2=~tr/N|n/N/;
if($length-1 == 0){
next;
}
if($ncount/($length-1)*100<=$ncutoff){
print O "$l$l2$l3$l4";
}
}
}
else{
my $line=<F>;
my $ncount=0;
my $seqlen=0;
my $seq="";
while(my $l=<F>){
if($l=~/^>/){
if($seqlen != 0){
if($ncount/$seqlen*100 <=$ncutoff){
print O "$line$seq\n";
}
}
$line=$l;
$ncount=0;
$seqlen=0;
$seq="";
}
else{
$l=~s/\s//g;
$seqlen += length($l);
$ncount += $l=~tr/N|n/N/;
$seq .= $l;
}
}
if($seqlen != 0){
if($ncount/$seqlen*100 <= $ncutoff){
print O "$line$seq\n";
}
}
}
close F;
close O;
-----------------------python script---------------------
#!/usr/bin/python
#-*- coding:utf-8 -*-
#
#
import re,os,sys
if len(sys.argv) < 3:
print "usage: basicStatistics.py <parent_file> <fasta|fastq> <outFile>"
sys.exit(-1)
file=sys.argv[1]
format=sys.argv[2]
outfile=sys.argv[3]
file_size=float(os.path.getsize(file))/1024/1024
#print ("%.3fM\n")%file_size
fh=open(file)
lines=fh.readlines()
if format == "fastq" :
read=[line[:-1] for line in lines[1::4]]
reads_num=len(read)
seq="".join(read).upper()
base_num=len(seq)
gc_num=seq.count("C")+seq.count("G")
gc_num_p=float(gc_num)/base_num*100
elif format == "fasta" :
read=[line[:-1] for line in lines[1::2]]
reads_num=len(read)
seq="".join(read).upper()
base_num=len(seq)
gc_num=seq.count("C")+seq.count("G")
gc_num_p=float(gc_num)/base_num*100
else:
print "usage: basicStatistics.py <parent_file> <fasta|fastq>"
sys.exit(-1)
#print ("%s\t%s\n%s\n%s\t%.3f\n")%(read,reads_num,seq,base_num,gc_num_p)
out=open(outfile,'w')
out.write("SampleFile: "+file+"\n")
out.write("Sum Data: %.3fM\n"%file_size)
out.write("Read Num: %i\n"%reads_num)
out.write("Read Len: %i\n"%(base_num/reads_num))
out.write("GC: %.3f\n"%gc_num_p)
fh.close()
out.close()
答案1
bash 中没有goto
命令,因此脚本中包含此命令的行不正确,应将其删除。
此外,行(我猜应该是标签)step1:
、step2:
和不正确,这些行实际上导致了错误消息。在您发布的屏幕截图上,消息很清晰:step3:
step4:
16sPIP.sh: line 168: step3:: command not found
和
16sPIP.sh: line 177: step4:: command not found
这显然表明“未找到”的命令是step3:
和step4:
。(您一定也收到过类似的step1:
和消息step2:
,但它们已经滚出屏幕,您可能没有注意到它们)。
与您的说法相反,脚本不会因以下消息而“崩溃”:bash 只是忽略未找到的命令并从脚本的下一行继续执行,这可以通过以下事实清楚地表明:文本(由脚本中的行后命令Step 4: Results report
打印)出现在屏幕上并且 Python 脚本开始执行(但抛出语法错误)。echo
step4:
basicStatistics.py
您必须重新设计脚本以摆脱goto
命令和标签,因为它们在 bash 中不存在,并用适当的控制结构替换它们。
您的脚本还存在另一个问题。如果变量为空,则if
包含“裸”变量的命令(如下面的命令)将因语法错误而失败。$NGS_R2
if [ $NGS_R2 -a "${FORMAT}" != "fastq" ]
if [ ${NGS_R2} -a -f ${NGS_R2} ]
if [ $NGS_R2 -a -f $NGS_R2 ]
我假设您想测试$NGS_R2
变量是否为空。在这种情况下,您应该使用-n
运算符,如下所示:
if [ -n "$NGS_R2" -a "${FORMAT}" != "fastq" ]
if [ -n "${NGS_R2}" -a -f "${NGS_R2}" ]
if [ -n "$NGS_R2" -a -f "$NGS_R2" ]
(您在步骤 4 中执行的 Python 脚本中似乎也有一个语法错误basicStatistics.py
- 屏幕截图上显示了一条错误消息 - 但这是一个完全不同的问题。)