我有一个如下所示的文件:
1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS= (626 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS= (626 aa)
37.3% identity
>>tr|Q6PSU6|Q6PSU6_ARAHY Conarachin (Fragment) OS=Arachi (303 aa)
29.4% identity
>>tr|Q6PSU3|Q6PSU3_ARAHY Conarachin (Fragment) OS=Arachi (580 aa)
29.4% identity
>>tr|A5Z1Q5|A5Z1Q5_ARADU Ara d 6 OS=Arachis duranensis O (145 aa)
23.7% identity
>>sp|P43237|ALL11_ARAHY Allergen Ara h 1, clone P17 OS=A (614 aa)
29.4% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS= (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|N1NEW2|N1NEW2_ARADU Seed storage protein Ara h1 OS= (614 aa)
29.4% identity
>>tr|B3IXL2|B3IXL2_ARAHY Main allergen Ara h1 OS=Arachis (614 aa)
29.4% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis (160 aa)
2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS= (626 aa)
37.3% identity
我想检索同一性大于或等于 35% 的行上方的行。预期输出是这样的:
1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS= (626 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS= (626 aa)
37.3% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS= (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity 2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>sp|P43238|ALL12_ARAHY Allergen Ara h 1, clone P41B OS= (626 aa)
37.3% identity
我尝试了以下操作:但还没有任何运气:
grep -B1 "35\+.*" -e '>>>' file > output_file
任何帮助表示赞赏!谢谢你!
答案1
我会避免尝试使用正则表达式进行数字比较。同样,由于这-B
是一个全局选项,因此您将不可避免地在>>>
。
你可以在 awk 中做这样的事情:
$ awk '/>>>/ {print} />>tr/ {last = $0} $1+0 >= 35 {print last; print}' file
1>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>tr|N1NG13|N1NG13_ARAHY Seed storage protein Ara h1 OS= (626 aa)
37.3% identity
>>tr|A8VT50|A8VT50_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A1YQB2|A1YQB2_BOVIN Alpha lactabumin (Fragment) OS= (52 aa)
50.0% identity
>>tr|A5Z1Q8|A5Z1Q8_ARADU Ara d 2.01 OS=Arachis duranensi (160 aa)
44.8% identity
>>tr|A8VT44|A8VT44_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
>>tr|A8VT41|A8VT41_ARADU Conglutin OS=Arachis duranensis (160 aa)
44.8% identity
2>>>PROKKA_00001 Transcriptional regulator PadR-like family protein - 137 aa
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
>>tr|E5G076|E5G076_ARAHY Ara h 1 allergen OS=Arachis hyp (619 aa)
37.3% identity
百分比字符串的转换$1 + 0
似乎至少由gawk
和支持mawk
。