我想删除空格(-)。如果在相同位置的所有 >Tem 中发现间隙连续 >10,则删除所有间隙,并从 Tem 中相同位置的查询中删除序列或间隙。示例 如果第一个模板中存在间隙,而第二个模板中没有间隙,则不删除该间隙。
输入文件示例
>Tem1.pdb
------------------------------------------------------------
--------------------------------GETLGEKWKKKLNQLSRKEFDLYKKSGI
TEVDRTEAKEGLKRGETT-HHAVSRGSAKLQWFVERNMVIPEGRVIDLGCGRGGWSYYCA
>Tem2.pdb
------------------------------------------------------------
--------------------------------GRTLGEQWKEKLNAMSREEFFKYRREAI
IEVDRTEARRARRENNIVGGHPVSRGSAKLRWLVEKGFVSPIGKVIDLGCGRGGWSYYAA
>Query_seq
PKFEKQLGQVMLLVLCAGQLLLMRTTWAFCEVLTLATGPILTLWEGNPGRFWNTTIAVST
ANIFRGSYLAGAGLAFSLIKNAQTPRRGTGTTGETLGEKWKRQLNSLDRKEFEEYKRSGI
LEVDRTEAKSALKDGSKI-KHAVSRGSSKIRWIVERGMVKPKGKVVDLGCGRGGWSYYMA
输出文件应如下所示
>Temp1
--------------------------------GETLGEKWKKKLNQLSRKEFDLYKKSGI
TEVDRTEAKEGLKRGETT-HHAVSRGSAKLQWFVERNMVIPEGRVIDLGCGRGGWSYYCA
>Temp2
--------------------------------GRTLGEQWKEKLNAMSREEFFKYRREAI
IEVDRTEARRARRENNIVGGHPVSRGSAKLRWLVEKGFVSPIGKVIDLGCGRGGWSYYAA
>Query_se
ANIFRGSYLAGAGLAFSLIKNAQTPRRGTGTTGETLGEKWKRQLNSLDRKEFEEYKRSGI
LEVDRTEAKSALKDGSKI-KHAVSRGSSKIRWIVERGMVKPKGKVVDLGCGRGGWSYYMA
答案1
你可以在 python 中执行此操作:
import re
tem1 = [ "------------------------------------------------------------",
"--------------------------------GETLGEKWKKKLNQLSRKEFDLYKKSGI",
"TEVDRTEAKEGLKRGETT-HHAVSRGSAKLQWFVERNMVIPEGRVIDLGCGRGGWSYYCA"
]
tem2 = [ "------------------------------------------------------------",
"--------------------------------GRTLGEQWKEKLNAMSREEFFKYRREAI",
"IEVDRTEARRARRENNIVGGHPVSRGSAKLRWLVEKGFVSPIGKVIDLGCGRGGWSYYAA"
]
query = [ "PKFEKQLGQVMLLVLCAGQLLLMRTTWAFCEVLTLATGPILTLWEGNPGRFWNTTIAVST",
"ANIFRGSYLAGAGLAFSLIKNAQTPRRGTGTTGETLGEKWKRQLNSLDRKEFEEYKRSGI",
"LEVDRTEAKSALKDGSKI-KHAVSRGSSKIRWIVERGMVKPKGKVVDLGCGRGGWSYYMA"
]
for line in range(2):
if re.search("^-*$", tem1[line]) and re.search("^-*$", tem2[line]):
tem1.pop(line)
tem2.pop(line)
query.pop(line)
print(tem1, tem2, query)
现在您需要做的就是解析输入文件并格式化输出文件