输入文本:
chrX_143483005-chr6_103649292,chrX_143483110-chr6_103649131 chrX_143483110-chr_6103649147 chrX_143483004-chr6_103649293,chrX_143483110-chr6_103649291,chrX_143483110-chr6_103649053
chrX_143483110-chr_6103649147 chrX_143483005-chr6_103649292,chrX_143483110-chr6_103649131 0
0 chrX_143483005-chr6_103649292,chrX_143483110-chr6_103649131 chrX_143482988-chr6_103649147,chrX_143483004-chr6_103649293,chrX_143483110-chr6_103649291,chrX_143483110-chr6_103649053
chrX_143483005-chr6_103649292,chrX_143483110-chr6_103649131 0 chrX_143483110-chr_6103649147
0 chrX_143483005-chr6_103649292,chrX_143483110-chr6_103649131 chrX_143482988-chr6_103649147,chrX_143483004-chr6_103649293,chrX_143483110-chr6_103649291,chrX_143483110-chr6_103649053
期望的输出:
chrX_143483005-chr6_103649292 chrX_143483110-chr_6103649147 chrX_143483004-chr6_103649293
chrX_143483110-chr_6103649147 chrX_143483005-chr6_103649292 0
0 chrX_143483005-chr6_103649292 chrX_143482988-chr6_103649147
chrX_143483005-chr6_103649292 0 chrX_143483110-chr_6103649147
0 chrX_143483005-chr6_103649292 chrX_143482988-chr6_103649147
尝试过:
## No. of Columns in each line.
awk '{print NF}' tt.txt
3
3
3
3
3
## operation to delete the co-ordinates affiliated with comma.
sed -e 's/\,chr[A-Z0-9]\_[0-9]-chr[A-Z0-9]\_[0-9]*.//g' tt.txt
基本上我想删除“,”之后的坐标,并且只想保留左手(第一个)坐标。
注意:1 在此操作中,列将与输入相同。 2. 逗号分隔的坐标不固定,可以是任意列。 3. 染色体可以是1-19、X和Y中的任何一个。
答案1
足够简单:
$ sed -E 's/,[^ ]+//g' in
chrX_143483005-chr6_103649292 chrX_143483110-chr_6103649147 chrX_143483004-chr6_103649293
chrX_143483110-chr_6103649147 chrX_143483005-chr6_103649292 0
0 chrX_143483005-chr6_103649292 chrX_143482988-chr6_103649147
chrX_143483005-chr6_103649292 0 chrX_143483110-chr_6103649147
0 chrX_143483005-chr6_103649292 chrX_143482988-chr6_103649147
(扩展)正则表达式/,[^ ]+/
将匹配逗号后跟的非空格字符系列的任何序列。
该sed
命令s
将用第二个参数(在本例中为空)替换第一个参数(在本例中为给定表达式)的任何匹配项;g
该命令的选项表示s
对找到的所有匹配项进行替换,而不仅仅是第一个匹配项。