我有一个file
:
Name v1 v2
Type1 ABC 32
Type1 DEF 44
Type1 XXX 45
Type2 ABC 78
Type2 XXX 23
Type3 DEF 22
Type3 XXX 12
Type4 ABC 55
Type4 DEF 78
Type5 ABC 99
Type6 DEF 00
我试图仅打印该文件的部分内容,其条件如下:
- 对于给定的名称,例如,如果在 column 中
Type1
存在,我想跳过打印该文件中所有出现的 。XXX
v1
Type1
- 对于给定的名称,如果列中
Type4
有ABC
和,我只想打印具有较小数值的那一行。DEF
v1
v2
- 对于给定的名称,例如
Type5
or ,Type6
其中只有ABC
orDEF
,我想打印它们。
我该怎么办?我可以将文件读入数组,但我不知道如何在多行中搜索特定列。
答案1
为此,您需要的工具是哈希 - 这是 Perl 存储键值对的方式。具体来说 - 我们需要将您的数据预处理为哈希值,以便我们可以“查找”最低值 或XXX
出现的位置。
幸运的是 - 你的第三个条件看起来像第二个条件的子集 - 如果你只是打印最低值,那么当只有一个时,最低值是相同的。
所以我可能会这样做:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
#read header line, because we don't want to process it;
#note - diamond operators are 'magic' file handles.
#they read either piped input on STDIN, or
#open/read files specified on command line.
#this is almost exactly like how sed/grep work.
my $header_line = <>;
#turn the rest of our intput into an array of arrays, split on whitespace/linefeeds.
my @lines = map { [split] } <>;
#print for diag
print Dumper \@lines;
#this hash tracks if we've 'seen' an XXX
my %skip_type;
#this hash tracks the lowest V2 value.
my %lowest_v2_for;
foreach my $record (@lines) {
#we could work with $record ->[0], etc.
#this is because I think it's more readable this way.
my ( $type, $v1, $v2 ) = @$record;
#find all the lines with "XXX" - store in a hash.
if ( $v1 eq "XXX" ) {
$skip_type{$type}++;
}
#check if this v2 is the lowest for this particular type.
#make a note if it is.
if ( not defined $lowest_v2_for{$type}
or $lowest_v2_for{$type} > $v2 )
{
$lowest_v2_for{$type} = $v2;
}
}
#print for diag - things we are skipping.
print Dumper \%skip_type;
print $header_line;
#run through our list again, testing the various conditions:
foreach my $record (@lines) {
my ( $type, $v1, $v2 ) = @$record;
#skip if it's got an XXX.
next if $skip_type{$type};
#skip if it isn't the lowest value
next if $lowest_v2_for{$type} < $v2;
#print otherwise.
print join( " ", @$record ), "\n";
}
这给出了(更少的一些诊断输出,Dumper
如果您不需要,可以随意丢弃):
Name v1 v2
Type4 ABC 55
Type5 ABC 99
Type6 DEF 00
答案2
我的看法:
perl -wE '
# read the data
chomp( my $header = <> );
my %data;
while (<>) {
chomp;
my @F = split;
$data{$F[0]}{$F[1]} = $F[2];
}
# requirement 1
delete $data{Type1} if exists $data{Type1}{XXX};
# requirement 2
if (exists $data{Type4}{ABC} and exists $data{Type4}{DEF}) {
if ($data{Type4}{ABC} <= $data{Type4}{DEF}) {
delete $data{Type4}{DEF};
}
else {
delete $data{Type4}{ABC};
}
}
# requirement 3
for my $name (qw/Type5 Type6/) {
delete $data{$name} unless (
scalar keys %{$data{$name}} == 1
and (exists $data{$name}{ABC} or exists $data{$name}{DEF})
);
}
$, = " ";
say $header;
for my $name (sort keys %data) {
for my $v1 (sort keys %{$data{$name}}) {
say $name, $v1, $data{$name}{$v1};
}
}
' file
输出
Name v1 v2
Type2 ABC 78
Type2 XXX 23
Type3 DEF 22
Type3 XXX 12
Type4 ABC 55
Type5 ABC 99
Type6 DEF 00
对于Type2和Type3没有要求
答案3
有三个不同的任务。一切都可以通过以下方式完成awk
:
XXX 之后跳过打印
$1 == "Type1" {if($2 == "XXX")f=1;if(! f)print}
Type4 的最小值
$1 == "Type4" {if(min > $3 || ! min)min = $3} END{print min}
打印选择线
$1$2 ~ "^(Type5|Type6)(ABC|DEF)$"