以 newick 格式编辑系统发育树中的名称

以 newick 格式编辑系统发育树中的名称

我有一个纽维克格式的系统发育树,我想删除类群名称的一些片段,

1_[genus_specie_1]_characters:0.2654682758,(((((((((((((((2_[genus_specie_2]_characters:0.0379334280,54_[genus_specie_2]_characters:0.0605802067)/1/100:0.0121248674,(3_[genus_specie_3]_characters:0.0206432295,4_[genus_specie_4]_characters:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,30_[genus_specie_5]_characters

例如,我想删除方括号中的片段

genus_specie_1:0.2654682758,(((((((((((((((genus_specie_2:0.0379334280,genus_specie_2:0.0605802067)/1/100:0.0121248674,(genus_specie_3:0.0206432295,genus_specie_4:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,genus_specie_5

我尝试使用一个衬垫 perl 删除所有方括号

perl -i -pe 'y/[]//d' file.nwk

我也尝试过下一个 sed 命令

sed 's/[[:alnum:]_]*\[\([[:alnum:]_]*\)\][[:alnum:]_]*/\1/g' 

但它不起作用

答案1

Perl 正则表达式在这里很好用:

$ initial='1_[genus_specie_1]_characters:0.2654682758,(((((((((((((((2_[genus_specie_2]_characters:0.0379334280,54_[genus_specie_2]_characters:0.0605802067)/1/100:0.0121248674,(3_[genus_specie_3]_characters:0.0206432295,4_[genus_specie_4]_characters:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,30_[genus_specie_5]_characters'
$ expected='genus_specie_1:0.2654682758,(((((((((((((((genus_specie_2:0.0379334280,genus_specie_2:0.0605802067)/1/100:0.0121248674,(genus_specie_3:0.0206432295,genus_specie_4:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,genus_specie_5'

$ result=$( perl -pe 's/\d+_\[(.+?)\]_.*?(?=:|$)/$1/g' <<<"$initial" )

$ [[ $result = $expected ]] && echo yes
yes

这使用非贪婪量词 ( .*?) 和前瞻 ( (?=:|$))

相关内容