提取分隔符分割的元素,同时保留循环变量

提取分隔符分割的元素,同时保留循环变量

我是 bash 的初学者,我正在尝试循环遍历短语列表,我的目标是

A)使用分割每个短语.,然后提取第一个分割元素

B)还有原始短语可用

我的伪代码/尝试看起来像这样 -

    while read x
    do
        eval "whole_phrase=$x" # store the whole phrase to another variable
        eval "first_element=echo $x | cut -d';' -f1" # extract the first element after splitting
        myprogram -i ../$first_element -o ../$whole_phrase
    done < ListOfDotSeparatedPhrases.txt

ListOfDotSeparatedPhrases.txt看起来是这样的——

18T3L.fastqAligned.sortedByCoord.out.bam
35T10R.fastqAligned.sortedByCoord.out.bam
18T6L.fastqAligned.sortedByCoord.out.bam
40T4LAligned.sortedByCoord.out.bam
22T10L.fastqAligned.sortedByCoord.out.bam
38T7L.fastqAligned.sortedByCoord.out.bam

我一直试图在网上搜索最好的方法来做到这一点,但失败了。有任何想法吗?我相信这实际上并不是很困难!

答案1

read让我们为您进行拆分,将字段分隔符设置为怎么样.

while IFS=. read -r first_element remainder; do 
  echo myprogram -i "../$first_element" -o "../${first_element}.${remainder}"
done < ListOfDotSeparatedPhrases.txt 
myprogram -i ../18T3L -o ../18T3L.fastqAligned.sortedByCoord.out.bam
myprogram -i ../35T10R -o ../35T10R.fastqAligned.sortedByCoord.out.bam
myprogram -i ../18T6L -o ../18T6L.fastqAligned.sortedByCoord.out.bam
myprogram -i ../40T4LAligned -o ../40T4LAligned.sortedByCoord.out.bam
myprogram -i ../22T10L -o ../22T10L.fastqAligned.sortedByCoord.out.bam
myprogram -i ../38T7L -o ../38T7L.fastqAligned.sortedByCoord.out.bam

man bash

read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p
       prompt] [-t timeout] [-u fd] [name ...]
              One line is read from the  standard  input,  or  from  the  file
              descriptor  fd  supplied  as an argument to the -u option, split
              into words as described above  under  Word  Splitting,  and  the
              first word is assigned to the first name, the second word to the
              second name, and so on.  If there are more words than names, the
              remaining words and their intervening delimiters are assigned to
              the last name.  If there are fewer words  read  from  the  input
              stream  than  names, the remaining names are assigned empty val‐
              ues.  The characters in IFS are used  to  split  the  line  into
              words  using  the  same  rules  the  shell  uses  for  expansion
              (described above under Word Splitting).


或者(实际上这更简单 - 也更便携)读取并保留整行,然后使用 shell 参数扩展通过删除其余部分来生成第一个元素:

while read -r x; do 
  myprogram -i "../${x%%.*}" -o "../$x"
done < ListOfDotSeparatedPhrases.txt

答案2

鉴于:

eval "whole_phrase=$x" # store the whole phrase to another variable

更好的是:

whole_phrase="$x"

并给出:

eval "first_element=echo $x | cut -d';' -f1" # extract the first element after splitting

有很多方法可以提取第一个元素。

由于您的分隔符是句点字符 或.,因此将其传递给awk并要求它仅打印第一个字段:

first_element="$(awk -F. '{print $1}' <<< "$x")"

或者,由于在这种特殊情况下您只需要第一个元素,因此很容易sed删除第一个.字符及其后的所有内容:

first_element="$(sed -e 's/\..*//' <<< "$x")"

最后,请考虑,只要您不更改x从文件中读取的变量,您就已经拥有该whole_phrase值。事实上,您可以在循环中使用该变量名称while

while read whole_phrase
do
    first_element="$(awk -F. '{print $1}' <<< "$whole_phrase")"
    myprogram -i "../$first_element" -o "../$whole_phrase"
done < ListOfDotSeparatedPhrases.txt

相关内容