我有一个制表符分隔的文件,如下所示。第一列是读数名称,第三列指定读数是细菌还是真核生物。每次读取可能有许多条目/命中。我想摘录他们的内容第一的 命中(行中的第一个条目)是细菌。
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN590686.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LT700188.1 Bacteria
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN598496.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN597789.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:9064:18223 LN596327.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:18258:19492 AL139347.10 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN600047.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN594833.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590681.1 Eukaryota
A00643:620:HFM7YDSX5:1:1101:31385:19554 LN590673.1 Eukaryota
非常感谢A。
答案1
如果我理解正确的话:
awk -F '\t' '!seen[$1]++ && $3 == "Bacteria"' < your-file
将打印第一个字段之前尚未见过且第三个字段为 的行Bacteria
。