已经嘲笑这个太久了,尝试了 grep、join、awk,但我无法获得正确的参数。我需要正确地执行命令。
我有两个文本文件。
猫文件1
@ABC:11:ABC:1:1111:1111:1111
@ABC:22:ABC:1:1111:4444:4444
猫文件2
@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
我想做两件事:
输出1)基于 file1,提取包含字符串以及两个附加字符串的所有行。
输出2)基于 file1,提取所有符合以下条件的行:不要包含字符串加上两个附加行 - 但它应该只尝试匹配以@开头的行..
输出示例 1):
猫输出1
@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
输出示例 2)
猫输出2
@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
(请不要使用 Perl )
答案1
您所显示和要求的是 grep fastq 文件中给定的一组读取。我强烈建议不要重新发明轮子并使用现有的工具,例如seqkit grep为了它。
尽管如此,这里还是“仅 bash”的变体:
4 个连续行属于一次读取。因此,我们可以将它们全部放在一行中,由制表符分隔,grep 查找 id 并将制表符转换回新行。
$ cat file2.fq|paste - - - -|grep -f file1.txt|tr "\t" "\n"
或者对于您的第二个输出,我们只需使用 invert 参数grep
$ cat file2.fq|paste - - - -|grep -v -f file1.txt|tr "\t" "\n