我有这个输入;
scaffold10x_1 AUGUSTUS gene 72040 72306 0.67 - . g4
scaffold10x_1 AUGUSTUS transcript 72040 72306 0.67 - . g4.t1
scaffold10x_1 AUGUSTUS stop_codon 72040 72042 . - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS CDS 72040 72306 0.67 - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS exon 72040 72306 . - . transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS start_codon 72304 72306 . - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS gene 72500 72970 0.99 - . g5
scaffold10x_1 AUGUSTUS transcript 72500 72970 0.99 - . g5.t1
scaffold10x_1 AUGUSTUS stop_codon 72500 72502 . - 0 transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS CDS 72500 72970 0.99 - 0 transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS exon 72500 72970 . - . transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS start_codon 72968 72970 . - 0 transcript_id "g5.t1"; gene_id "g5";
我很想有这样的输出;
scaffold10x_1 AUGUSTUS gene 72040 72306 0.67 - . transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS transcript 72040 72306 0.67 - . transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS stop_codon 72040 72042 . - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS CDS 72040 72306 0.67 - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS exon 72040 72306 . - . transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS start_codon 72304 72306 . - 0 transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1 AUGUSTUS gene 72500 72970 0.99 - . transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS transcript 72500 72970 0.99 - . transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS stop_codon 72500 72502 . - 0 transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS CDS 72500 72970 0.99 - 0 transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS exon 72500 72970 . - . transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1 AUGUSTUS start_codon 72968 72970 . - 0 transcript_id "g5.t1"; gene_id "g5";
如何在 Linux 上使用 sed 命令来获得所需的输出?谢谢凯
答案1
通常最好解释一下替换逻辑和一些边界条件,因此无需猜测。你没有,所以我必须做出一些假设:
g
如果该行的最后一列后跟任何(可能是多位)数字和可选.t1
(始终1
),则您想要进行替换- 无论该行是否有 a
.t1
,应该transcript_id
足够.t1
,gene_id
不应该 - 列分隔符是空格
在这种情况下,以下脚本应该可以工作。否则,您需要对其进行调整:
sed -E 's/ (g[0-9]*)(\.t1)?$/ transscript_id "\1.t1"; gene_id "\1";/' yourfile