Perl 中的跨行比较

Perl 中的跨行比较

我有一个file

Name v1 v2 
Type1 ABC 32
Type1 DEF 44
Type1 XXX 45
Type2 ABC 78 
Type2 XXX 23 
Type3 DEF 22 
Type3 XXX 12 
Type4 ABC 55 
Type4 DEF 78 
Type5 ABC 99 
Type6 DEF 00

我试图仅打印该文件的部分内容,其条件如下:

  • 对于给定的名称,例如,如果在 column 中Type1存在,我想跳过打印该文件中所有出现的 。XXXv1Type1
  • 对于给定的名称,如果列中Type4ABC和,我只想打印具有较小数值的那一行。DEFv1v2
  • 对于给定的名称,例如Type5or ,Type6其中只有ABCor DEF,我想打印它们。

我该怎么办?我可以将文件读入数组,但我不知道如何在多行中搜索特定列。

答案1

为此,您需要的工具是哈希 - 这是 Perl 存储键值对的方式。具体来说 - 我们需要将您的数据预处理为哈希值,以便我们可以“查找”最低值 或XXX出现的位置。

幸运的是 - 你的第三个条件看起来像第二个条件的子集 - 如果你只是打印最低值,那么当只有一个时,最低值是相同的。

所以我可能会这样做:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

#read header line, because we don't want to process it; 
#note - diamond operators are 'magic' file handles. 
#they read either piped input on STDIN, or 
#open/read files specified on command line. 
#this is almost exactly like how sed/grep work. 
my $header_line = <>;
#turn the rest of our intput into an array of arrays, split on whitespace/linefeeds. 
my @lines = map { [split] } <>;

#print for diag
print Dumper \@lines;

#this hash tracks if we've 'seen' an XXX
my %skip_type;
#this hash tracks the lowest V2 value. 
my %lowest_v2_for;
foreach my $record (@lines) {
    #we could work with $record ->[0], etc.
    #this is because I think it's more readable this way. 
    my ( $type, $v1, $v2 ) = @$record;

    #find all the lines with "XXX" - store in a hash.
    if ( $v1 eq "XXX" ) {
        $skip_type{$type}++;
    }

    #check if this v2 is the lowest for this particular type. 
    #make a note if it is. 
    if ( not defined $lowest_v2_for{$type}
        or $lowest_v2_for{$type} > $v2 )
    {
        $lowest_v2_for{$type} = $v2;
    }
}

#print for diag - things we are skipping. 
print Dumper \%skip_type;


print $header_line;

#run through our list again, testing the various conditions:
foreach my $record (@lines) {
    my ( $type, $v1, $v2 ) = @$record;

    #skip if it's got an XXX. 
    next if $skip_type{$type};
    #skip if it isn't the lowest value
    next if $lowest_v2_for{$type} < $v2;
    #print otherwise.
    print join( " ", @$record ), "\n";
}

这给出了(更少的一些诊断输出,Dumper如果您不需要,可以随意丢弃):

Name v1 v2 
Type4 ABC 55
Type5 ABC 99
Type6 DEF 00

答案2

我的看法:

perl -wE ' 
    # read the data 
    chomp( my $header = <> ); 
    my %data; 
    while (<>) { 
        chomp; 
        my @F = split; 
        $data{$F[0]}{$F[1]} = $F[2]; 
    } 

    # requirement 1 
    delete $data{Type1} if exists $data{Type1}{XXX}; 

    # requirement 2 
    if (exists $data{Type4}{ABC} and exists $data{Type4}{DEF}) { 
        if ($data{Type4}{ABC} <= $data{Type4}{DEF}) { 
            delete $data{Type4}{DEF}; 
        } 
        else { 
            delete $data{Type4}{ABC}; 
        } 
    } 

    # requirement 3 
    for my $name (qw/Type5 Type6/) { 
        delete $data{$name} unless ( 
            scalar keys %{$data{$name}} == 1 
            and (exists $data{$name}{ABC} or exists $data{$name}{DEF}) 
        ); 
    } 

    $, = " "; 
    say $header; 
    for my $name (sort keys %data) { 
        for my $v1 (sort keys %{$data{$name}}) { 
            say $name, $v1, $data{$name}{$v1}; 
        } 
    } 
' file 

输出

Name v1 v2 
Type2 ABC 78
Type2 XXX 23
Type3 DEF 22
Type3 XXX 12
Type4 ABC 55
Type5 ABC 99
Type6 DEF 00

对于Type2和Type3没有要求

答案3

有三个不同的任务。一切都可以通过以下方式完成awk

  1. XXX 之后跳过打印

    $1 == "Type1" {if($2 == "XXX")f=1;if(! f)print}

  2. Type4 的最小值

    $1 == "Type4" {if(min > $3 || ! min)min = $3} END{print min}

  3. 打印选择线

    $1$2 ~ "^(Type5|Type6)(ABC|DEF)$"

相关内容