比较两个文本文件,提取 file2 的匹配行以及其他行

比较两个文本文件,提取 file2 的匹配行以及其他行

已经嘲笑这个太久了,尝试了 grep、join、awk,但我无法获得正确的参数。我需要正确地执行命令。

我有两个文本文件。

猫文件1

@ABC:11:ABC:1:1111:1111:1111
@ABC:22:ABC:1:1111:4444:4444


猫文件2

@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF


我想做两件事:

输出1)基于 file1,提取包含字符串以及两个附加字符串的所有行。
输出2)基于 file1,提取所有符合以下条件的行:不要包含字符串加上两个附加行 - 但它应该只尝试匹配以@开头的行..



输出示例 1):

猫输出1

@ABC:11:ABC:1:1111:1111:1111 1:N:0:TCCCGCGC+AGGCGGGG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:4444:4444 1:N:0:TCCCGCGC+AGGCGGGG
TTTTTTTTTTTTTGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF


输出示例 2)

猫输出2

@ABC:11:ABC:1:1111:2222:2222 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ABC:22:ABC:1:1111:3333:3333 1:N:0:TCCCGCGC+AGGCGGGG
AGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGGAGGCGGGG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF



(请不要使用 Perl )

答案1

您所显示和要求的是 grep fastq 文件中给定的一组读取。我强烈建议不要重新发明轮子并使用现有的工具,例如seqkit grep为了它。

尽管如此,这里还是“仅 bash”的变体:

4 个连续行属于一次读取。因此,我们可以将它们全部放在一行中,由制表符分隔,grep 查找 id 并将制表符转换回新行。

$ cat file2.fq|paste - - - -|grep -f file1.txt|tr "\t" "\n"

或者对于您的第二个输出,我们只需使用 invert 参数grep

$ cat file2.fq|paste - - - -|grep -v -f file1.txt|tr "\t" "\n

相关内容