使用 ID 提取 ID 标头并添加到文件 B 中的第二列

使用 ID 提取 ID 标头并添加到文件 B 中的第二列

我有一个包含入藏号家族名称和名称的文件 A,以及包含 ID 和序列的文件 B。

我想使用 B 中的登录号来检索 A 中的家族名称和病毒名称,并将其添加到 B 中的第二列。

例子

文件A

NC_001348 PEPS Herpesviridae Human herpesvirus 3, complete genome.txt
NC_001350 PEPS Herpesviridae Saimiriine herpesvirus 2 complete genome.txt
NC_001491 PEPS Herpesviridae Equid herpesvirus 1, complete genome.txt
NC_001798 PEPS Herpesviridae Human herpesvirus 2 strain HG52, complete genome.txt
NC_001806 PEPS Herpesviridae Human herpesvirus 1 strain 17, complete genome.txt
NC_001826 PEPS Herpesviridae Murine herpesvirus 68 strain WUMS, complete genome.txt
NC_001844 PEPS Herpesviridae Equid herpesvirus 4, complete genome.txt
NC_001847 PEPS Herpesviridae Bovine herpesvirus 1, complete genome.txt
NC_001987 PEPS Herpesviridae Ateline herpesvirus 3 complete genome.txt
NC_002229 PEPS Herpesviridae Gallid herpesvirus 2, complete genome.txt

文件B

NC_001348_71671_71760_KY215944.1    GCGCGGCTGGTGATGCAATGCGTGACCAGCTACTGGCGCAACTCGCGCTGCGCCGCCTTTGTGAACAGCTTCCCCATGGTGATGTACATC
NC_001350_89668_89757_HQ221963.1    CTTTCAGGATTTTCTGGCAGTTTTGCTGTCAAGAATGACATGATCTGGTGATGCCATATCTCAATATACAGCGCAGTGCTCACTGGTCTG
NC_001491_126502_126591_AF480884.1  AACGTGTCGGTGCGCACGGCCGTCAGGGCGAAGCCCGGGTGGATGTGGGCCTTGGTCTGCAGCACCAGCGACACCGGCGAGATCTTGTAC
NC_001798_97563_97652_AY714813.1    CGCAGGTGCCCGAAGACGTCGCAGACGGCCGCCCGCAGGGCCATGCACTGCATGGAGCCCGTGGTGCCGCCCGGCCCCCGGTCCAGGTGC
NC_001806_196955_197044_FJ483970.2  TCATCGATCTCAGTCTGTCGGCCGCTCCACGGCTCTGACTGGACTTTCCAAAGTACATACTGCAGTCAGAGCTGTCGAGCGGTTAACAGA

预期输出

NC_001348_71671_71760_KY215944.1    Herpesviridae Human herpesvirus 3, complete genome  GCGCGGCTGGTGATGCAATGCGTGACCAGCTACTGGCGCAACTCGCGCTGCGCCGCCTTTGTGAACAGCTTCCCCATGGTGATGTACATC
NC_001350_89668_89757_HQ221963.1    Herpesviridae Saimiriine herpesvirus 2 complete genome  CTTTCAGGATTTTCTGGCAGTTTTGCTGTCAAGAATGACATGATCTGGTGATGCCATATCTCAATATACAGCGCAGTGCTCACTGGTCTG
NC_001491_126502_126591_AF480884.1  Herpesviridae Equid herpesvirus 1, complete genome  AACGTGTCGGTGCGCACGGCCGTCAGGGCGAAGCCCGGGTGGATGTGGGCCTTGGTCTGCAGCACCAGCGACACCGGCGAGATCTTGTAC
NC_001798_97563_97652_AY714813.1    Herpesviridae Human herpesvirus 2 strain HG52, complete genome  CGCAGGTGCCCGAAGACGTCGCAGACGGCCGCCCGCAGGGCCATGCACTGCATGGAGCCCGTGGTGCCGCCCGGCCCCCGGTCCAGGTGC
NC_001806_196955_197044_FJ483970.2  Herpesviridae Human herpesvirus 1 strain 17, complete genome    TCATCGATCTCAGTCTGTCGGCCGCTCCACGGCTCTGACTGGACTTTCCAAAGTACATACTGCAGTCAGAGCTGTCGAGCGGTTAACAGA

答案1

命令:

c=`awk '{print NR}' file2| sort -nr | sed -n '1p'`;for ((i=1;i<=$c;i++)); do j=`awk -v i="$i" 'NR==i{$1=$2="";print $0}' file1`; awk -v i="$i" -v j="$j" 'NR == i{$3=$2;$2=j;print $0}' file2; done| sed "s/complete genome.txt/complete genome/g"

输出

c=`awk '{print NR}' file2| sort -nr | sed -n '1p'`;for ((i=1;i<=$c;i++)); do j=`awk -v i="$i" 'NR==i{$1=$2="";print $0}' file1`; awk -v i="$i" -v j="$j" 'NR == i{$3=$2;$2=j;print $0}' file2; done| sed "s/complete genome.txt/complete genome/g"
NC_001348_71671_71760_KY215944.1   Herpesviridae Human herpesvirus 3, complete genome GCGCGGCTGGTGATGCAATGCGTGACCAGCTACTGGCGCAACTCGCGCTGCGCCGCCTTTGTGAACAGCTTCCCCATGGTGATGTACATC
NC_001350_89668_89757_HQ221963.1   Herpesviridae Saimiriine herpesvirus 2 complete genome CTTTCAGGATTTTCTGGCAGTTTTGCTGTCAAGAATGACATGATCTGGTGATGCCATATCTCAATATACAGCGCAGTGCTCACTGGTCTG
NC_001491_126502_126591_AF480884.1   Herpesviridae Equid herpesvirus 1, complete genome AACGTGTCGGTGCGCACGGCCGTCAGGGCGAAGCCCGGGTGGATGTGGGCCTTGGTCTGCAGCACCAGCGACACCGGCGAGATCTTGTAC
NC_001798_97563_97652_AY714813.1   Herpesviridae Human herpesvirus 2 strain HG52, complete genome CGCAGGTGCCCGAAGACGTCGCAGACGGCCGCCCGCAGGGCCATGCACTGCATGGAGCCCGTGGTGCCGCCCGGCCCCCGGTCCAGGTGC
NC_001806_196955_197044_FJ483970.2   Herpesviridae Human herpesvirus 1 strain 17, complete genome TCATCGATCTCAGTCTGTCGGCCGCTCCACGGCTCTGACTGGACTTTCCAAAGTACATACTGCAGTCAGAGCTGTCGAGCGGTTAACAGA

相关内容