如何将字符添加到没有字符的行

如何将字符添加到没有字符的行

我的数据的前几行看起来像

scaffold10x_1   AUGUSTUS    gene    3591    3908    0.61    -   .   g1
scaffold10x_1   AUGUSTUS    transcript  3591    3908    0.61    -   .   g1.t1
scaffold10x_1   AUGUSTUS    stop_codon  3591    3593    .   -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    CDS 3591    3908    0.61    -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    exon    3591    3908    .   -   .   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    start_codon 3906    3908    .   -   0   transcript_id "g1.t1"; gene_id "g1";

我需要添加";到最后一列中缺少它们的行。我已经习惯于grep -v transcript_id canada.gtf | grep -v "^#"识别那些缺少它们的行。我可以使用 linux 命令来执行此操作吗?

答案1

sed方法:

sed 's/[^[:space:]]\+[^;[:space:]]$/"&";/' file

输出:

scaffold10x_1   AUGUSTUS    gene    3591    3908    0.61    -   .   "g1";
scaffold10x_1   AUGUSTUS    transcript  3591    3908    0.61    -   .   "g1.t1";
scaffold10x_1   AUGUSTUS    stop_codon  3591    3593    .   -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    CDS 3591    3908    0.61    -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    exon    3591    3908    .   -   .   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    start_codon 3906    3908    .   -   0   transcript_id "g1.t1"; gene_id "g1";

答案2

sed命令将确保每一行都以一个分号结尾,并且每一行中的最后一个单词都被引用:

sed -e 's/"\?\([a-z0-9.]\+\)"\?;*$/"\1";/' canada.gtf

以下是该命令的输出:

scaffold10x_1   AUGUSTUS    gene    3591    3908    0.61    -   .   "g1";
scaffold10x_1   AUGUSTUS    transcript  3591    3908    0.61    -   .   "g1.t1";
scaffold10x_1   AUGUSTUS    stop_codon  3591    3593    .   -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    CDS 3591    3908    0.61    -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    exon    3591    3908    .   -   .   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    start_codon 3906    3908    .   -   0   transcript_id "g1.t1"; gene_id "g1";

如果您想就地修改文件,则可以使用该-i标志:

sed -i -e 's/"\?\([a-z0-9.]\+\)"\?;*$/"\1";/' canada.gtf

如果您只想确保每行以以下结尾";(并且您不希望"在该行最后一个单词的开头出现匹配),那么您可以使用以下命令:

sed -e 's/"\?;\?$/";/' canada.gtf

这是该命令的输出:

scaffold10x_1   AUGUSTUS    gene    3591    3908    0.61    -   .   g1";
scaffold10x_1   AUGUSTUS    transcript  3591    3908    0.61    -   .   g1.t1";
scaffold10x_1   AUGUSTUS    stop_codon  3591    3593    .   -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    CDS 3591    3908    0.61    -   0   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    exon    3591    3908    .   -   .   transcript_id "g1.t1"; gene_id "g1";
scaffold10x_1   AUGUSTUS    start_codon 3906    3908    .   -   0   transcript_id "g1.t1"; gene_id "g1";

答案3

@Kay NewEdge 达拉莫拉

通过使用下面的 oneliner 我取得了结果

代码:


sed  's/[a-z][0-9]$/&";/g' example.txt |sed 's/[a-z][0-9].\{2\}/"&/g'

输出


scaffol"d10x_1   AUGUSTUS    gene    3591    3908    0.61    -   .   "g1";
scaffol"d10x_1   AUGUSTUS    transcript  3591    3908    0.61    -   .   "g1.t1";
scaffol"d10x_1   AUGUSTUS    stop_codon  3591    3593    .   -   0   transcript_id ""g1.t1"; gene_id ""g1";
scaffol"d10x_1   AUGUSTUS    CDS 3591    3908    0.61    -   0   transcript_id ""g1.t1"; gene_id ""g1";
scaffol"d10x_1   AUGUSTUS    exon    3591    3908    .   -   .   transcript_id ""g1.t1"; gene_id ""g1";
scaffol"d10x_1   AUGUSTUS    start_codon 3906    3908    .   -   0   transcript_id ""g1.t1"; gene_id ""g1";  

相关内容