如何更改列内容

如何更改列内容

我有这个输入;

scaffold10x_1   AUGUSTUS    gene    72040   72306   0.67    -   .   g4
scaffold10x_1   AUGUSTUS    transcript  72040   72306   0.67    -   .   g4.t1
scaffold10x_1   AUGUSTUS    stop_codon  72040   72042   .   -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    CDS 72040   72306   0.67    -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    exon    72040   72306   .   -   .   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    start_codon 72304   72306   .   -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    gene    72500   72970   0.99    -   .   g5
scaffold10x_1   AUGUSTUS    transcript  72500   72970   0.99    -   .   g5.t1
scaffold10x_1   AUGUSTUS    stop_codon  72500   72502   .   -   0   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    CDS 72500   72970   0.99    -   0   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    exon    72500   72970   .   -   .   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    start_codon 72968   72970   .   -   0   transcript_id "g5.t1"; gene_id "g5";

我很想有这样的输出;

scaffold10x_1   AUGUSTUS    gene    72040   72306   0.67    -   .   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    transcript  72040   72306   0.67    -   .   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    stop_codon  72040   72042   .   -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    CDS 72040   72306   0.67    -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    exon    72040   72306   .   -   .   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    start_codon 72304   72306   .   -   0   transcript_id "g4.t1"; gene_id "g4";
scaffold10x_1   AUGUSTUS    gene    72500   72970   0.99    -   .   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    transcript  72500   72970   0.99    -   .   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    stop_codon  72500   72502   .   -   0   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    CDS 72500   72970   0.99    -   0   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    exon    72500   72970   .   -   .   transcript_id "g5.t1"; gene_id "g5";
scaffold10x_1   AUGUSTUS    start_codon 72968   72970   .   -   0   transcript_id "g5.t1"; gene_id "g5";

如何在 Linux 上使用 sed 命令来获得所需的输出?谢谢凯

答案1

通常最好解释一下替换逻辑和一些边界条件,因此无需猜测。你没有,所以我必须做出一些假设:

  • g如果该行的最后一列后跟任何(可能是多位)数字和可选.t1(始终1),则您想要进行替换
  • 无论该行是否有 a .t1,应该transcript_id足够.t1gene_id不应该
  • 列分隔符是空格

在这种情况下,以下脚本应该可以工作。否则,您需要对其进行调整:

sed -E 's/ (g[0-9]*)(\.t1)?$/ transscript_id "\1.t1"; gene_id "\1";/' yourfile

相关内容