根据每个字符串之间不规则的项目数,根据特定字符串将一列拆分为多列

根据每个字符串之间不规则的项目数,根据特定字符串将一列拆分为多列

我有一个包含唯一列的文件,我想根据特定字符串 (chr) 将此列拆分为多个列。第一个字符串和第二个字符串、第二个字符串和第三个字符串、第n个字符串和第m个字符串之间的项目数不规则。

输入看起来像这样:

chr10:127293562-127293909
BRUNOL4(Hs/Mm)
CPEB4(Hs/Mm)
CUG-BP(Hs/Mm)
DAZAP1(Hs/Mm)
ENOX1(Hs/Mm)
FMR1(Hs/Mm)
chr11:49214073-49214804
BRUNOL4(Hs/Mm)
BRUNOL5(Hs/Mm)
CPEB2(Hs/Mm)
CPEB4(Hs/Mm)
CUG-BP(Hs/Mm)
HNRNPC(Hs/Mm)
HNRNPCL1(Hs/Mm)
HNRNPH1(Hs/Mm)
HuR(Hs/Mm)
MBNL1(Hs/Mm)
NOVA1(Hs/Mm)
chr11:49854587-49855127
A1CF(Hs/Mm)
BRUNOL4(Hs/Mm)

输出应如下所示:

chr10:127293562-127293909  chr11:49214073-49214804  chr11:498547-498551
BRUNOL4(Hs/Mm)             BRUNOL4(Hs/Mm)           A1CF(Hs/Mm)
CPEB4(Hs/Mm)               BRUNOL5(Hs/Mm)           BRUNOL4(Hs/Mm)
CUG-BP(Hs/Mm)              CPEB2(Hs/Mm)
DAZAP1(Hs/Mm)              CPEB4(Hs/Mm)    
ENOX1(Hs/Mm)               CUG-BP(Hs/Mm)
FMR1(Hs/Mm)                HNRNPC(Hs/Mm)
                           HNRNPCL1(Hs/Mm)
                           HNRNPH1(Hs/Mm)
                           HuR(Hs/Mm)
                           MBNL1(Hs/Mm)
                           NOVA1(Hs/Mm)

答案1

$ csplit -zsf file -n 1 ip.txt /^chr/ {*} ; paste file* | column -nt
chr10:127293562-127293909  chr11:49214073-49214804  chr11:49854587-49855127
BRUNOL4(Hs/Mm)             BRUNOL4(Hs/Mm)           A1CF(Hs/Mm)
CPEB4(Hs/Mm)               BRUNOL5(Hs/Mm)           BRUNOL4(Hs/Mm)
CUG-BP(Hs/Mm)              CPEB2(Hs/Mm)             
DAZAP1(Hs/Mm)              CPEB4(Hs/Mm)             
ENOX1(Hs/Mm)               CUG-BP(Hs/Mm)            
FMR1(Hs/Mm)                HNRNPC(Hs/Mm)            
                           HNRNPCL1(Hs/Mm)          
                           HNRNPH1(Hs/Mm)           
                           HuR(Hs/Mm)               
                           MBNL1(Hs/Mm)             
                           NOVA1(Hs/Mm)             
  • csplit用于根据模式分割文件
    • -z删除空文件的选项(对于模式匹配第一行本身的情况)
    • -s抑制日志输出
    • -f file -n 1输出文件名以file一位数字后缀开头
    • ip.txt是输入文件,/^chr/是要处理的模式
    • {*}尽可能多的分割
  • paste然后用于按列连接拆分文件
  • column -nt用于设置粘贴输出的样式,防止合并相邻分隔符和 GNU 扩展的-n默认行为column

答案2

没有任何管道:

#!/usr/bin/env perl

use strict; use warnings;

my $c = -1; my $arr = [];

while (<>) {
    if (/^chr/) {$c++};
    chomp;
    push(@{ $arr->[$c] }, $_);
}

foreach my $i (0...scalar(@{ $arr->[1] }) -1) {
    printf("%-30s %s\n", $arr->[0]->[$i], $arr->[1]->[$i]);
}

输出

chr10:127293562-127293909      chr11:49214073-49214804
BRUNOL4(Hs/Mm)                 BRUNOL4(Hs/Mm)
CPEB4(Hs/Mm)                   BRUNOL5(Hs/Mm)
CUG-BP(Hs/Mm)                  CPEB2(Hs/Mm)
DAZAP1(Hs/Mm)                  CPEB4(Hs/Mm)
ENOX1(Hs/Mm)                   CUG-BP(Hs/Mm)
FMR1(Hs/Mm)                    HNRNPC(Hs/Mm)

相关内容