按模式比较和匹配多个文件

按模式比较和匹配多个文件

我需要处理这些文本文件。 (字段以逗号分隔)


$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,

$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,

$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,

期望的输出是:


389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0

正如您所看到的,文件按模式“SEED”对齐,然后将文件的所有第二列水平相加,将结果添加到最后一列。

答案1

这是一个使用 perl 的可行解决方案:

创建一个文件,例如 mergeseeds.pl:

#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;

use constant COLUMNS=>3;
use constant SUM_COL=>2;

sub readfile($)
{
  my $f=shift;
  my @lines = `cat $f`;
  chomp @lines;
  return \@lines;
}

sub findseed($)
{
  my $arr = shift;
  my $line = 0;
  for(@$arr)
  {
    return $line if(/SEED/);
    $line++;
  }
  return $line;
}

sub process($$$)
{
  my ($colnum,$numfiles,$line)=@_;
  my $sum = 0;
  my @nums = (split(/,/,$line.",END"));
  while($colnum < scalar @nums-1) 
  {
    $sum += $nums[$colnum-1];
    $colnum+=COLUMNS;
  }
  print $line.",".$sum."\n";
}

sub popvalue($;@) 
{
  my ($arr,@filler)=@_;
  return @$arr ? (shift @$arr) : (@filler);
}

sub pad_array($$$)
{
  my ($arr,$pad,$filler)=@_;
  while ($pad--)
  {
    unshift @$arr, $filler;
  }
}

sub pad_arrays($$)
{
  my ($files,$pads)=@_;
  for(@{$files})
  {
    pad_array($_,shift @$pads,",,");
  }
}

sub merge_files(@) 
{
  my @files=@_;
  my $numfiles = scalar @files;
  my @seedsfound = map { findseed($_); } @files;
  my $maxseed = max(@seedsfound);
  my @padcounts = map { ($maxseed - $_); } @seedsfound;

  pad_arrays(\@files,\@padcounts);

  my $maxlines = max( map { scalar @$_; } @files);

  my $line= 0;
  while($line < $maxlines)
  {
    my @items = map {popvalue($_,",,"); } (@files);
    process(SUM_COL,$numfiles,join(",",@items));
    $line++;
  }
}

sub read_files(@)
{
    my @filenames=@_;
    my @files = map { readfile($_); } @filenames;
    return @files;
}

sub usage($)
{
   my ($msg)=(@_);
   print STDERR "usage: $0 ≤Filename>...\n";
   print STDERR $msg."\n";
   exit 1;
}
sub main(@)
{
    my @fnames=();
    for my $f (@_)
    {
       if(! -f $f)
       {
           usage( "ERROR: not a file:$f\n");
       }
       push(@fnames,$f);
    }
    merge_files(read_files(@fnames));
}

main(@ARGV);

然后从命令行调用:

perl mergeseeds.py File*.seed

相关内容