如何根据原始文件中的列标题将文件拆分为单独的文件？

Question 1

$ awk '
    NR == 1 {
        for (i=1; i<=NF; i++) {
            output[i] = "output" $i ".txt"
            files[output[i]] = 1
        }
        next
    }
    {
        for (i=1; i<=NF; i++)  printf "%s", $i > output[i]
        for (file in files)    print ""        > file
    }
' input.filename

$ for f in output*.txt; do echo $f; cat $f; done
output1.txt
02202020
02101011
02101011
output2.txt
2022002
1022002
1022002
output3.txt
220111
220000
220000
output30.txt
00202
00202
00202
output4.txt
2020002
2020012
2020012

请注意，标题行有 32 个字段，其他行有 33 个字段。这需要首先修复。

Answer

$ awk '
    NR == 1 {
        for (i=1; i<=NF; i++) {
            output[i] = "output" $i ".txt"
            files[output[i]] = 1
        }
        next
    }
    {
        for (i=1; i<=NF; i++)  printf "%s", $i > output[i]
        for (file in files)    print ""        > file
    }
' input.filename

$ for f in output*.txt; do echo $f; cat $f; done
output1.txt
02202020
02101011
02101011
output2.txt
2022002
1022002
1022002
output3.txt
220111
220000
220000
output30.txt
00202
00202
00202
output4.txt
2020002
2020012
2020012

请注意，标题行有 32 个字段，其他行有 33 个字段。这需要首先修复。

Question 2

Perl 脚本。

设置文件名$in代替genome.txt或将名称作为参数。

命名脚本counter.pl并赋予其可执行权限，然后运行它./counter.pl

chmod 755 counter.pl
./counter.pl

或者

chmod 755 counter.pl
./counter.pl genome.txt

计数器.pl：

#!/usr/bin/perl

use strict;
use warnings;

my $in = $ARGV[0] || 'genome.txt'; # input file name

open (my $F, '<', $in) or die "Cannot open input file $!";
my $n = 0;
my %fd = ();
my @fd = ();

while (<$F>) {
        # trim
        s/^\s+//;
        s/\s+$//;
        next if (!$_); # Skip empty lines
        my @x = split(/\s+/, $_);
        # 1st line, open files
        if ( ! $n++)  {
           my $fd = 0;
           for (@x) {
              open ($fd{$_}, '>', "output$_.txt") 
                or die ("Cannot open file $!")
                  if (!exists($fd{$_}));
              $fd[$fd++] = $_;
           }
        }
        else { # Write data
           die ("Should have " . ($#fd+1) . " entries on line $n")
             if ($#x != $#fd);
           for (0 .. $#x) {
              print {$fd{$fd[$_]}} ($x[$_]);
           }
           print {$fd{$_}} ("\n") for (keys %fd);
        }
}

close $fd{$_} for (keys %fd);
close $F;
# the end

修复了每行的字数（示例中有时为 32，有时为 33）。

此版本可以容纳任何列的变化，但所有行必须具有相同的单词数。die如果字数不同，或者无法打开文件，则会出现错误（行）。

只需调整文件名（$in）即可。

输入文件：（去掉末尾多余的0）

1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 30 30 30 30
0 2 2 0 2 0 2 0 2 0 2 2 0 0 2 2 2 0 1 1 1 2 0 2 0 0 0 2 0 2 0 2
0 2 1 0 1 0 1 1 1 0 2 2 0 0 2 2 2 0 0 0 0 2 0 2 0 0 1 2 0 2 0 2
0 2 1 0 1 0 1 1 1 0 2 2 0 0 2 2 2 0 0 0 0 2 0 2 0 0 1 2 0 2 0 2

输出1.txt

02202020
02101011
02101011

输出2.txt

2022002
1022002
1022002

输出30.txt

0202
0202
0202

输出3.txt

220111
220000
220000

输出4.txt

2020002
2020012
2020012

Answer