我需要处理这些文本文件。 (字段以逗号分隔)
$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,
$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,
$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,
期望的输出是:
389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0
正如您所看到的,文件按模式“SEED”对齐,然后将文件的所有第二列水平相加,将结果添加到最后一列。
答案1
这是一个使用 perl 的可行解决方案:
创建一个文件,例如 mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return \@lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."\n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(\@files,\@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...\n";
print STDERR $msg."\n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$f\n");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
然后从命令行调用:
perl mergeseeds.py File*.seed