第一列相同的平均行

Question 1

使用 awk ：

输入文件

shell 中的 awk ：

$ awk '
    NR>1{
        arr[$1]   += $2
        count[$1] += 1
    }
    END{
        for (a in arr) {
            print "id avg " a " = " arr[a] / count[a]
        }
    }
' FILE

或者在 shell 中使用 Perl ：

$ perl -lane '
    END {
        foreach my $key (keys(%hash)) {
            print "id avg $key = " . $hash{$key} / $count{$key};
        }
    }
    if ($. > 1) {
        $hash{$F[0]}  += $F[1];
        $count{$F[0]} += 1;
    }
' FILE

输出是：

id avg 601 = 24
id avg 510 = 64.4

最后一个笑话是 Perl 黑暗混淆的单行 =)

perl -lane'END{for(keys(%h)){print"$_:".$h{$_}/$c{$_}}}($.>1)&&do{$h{$F[0]}+=$F[1];$c{$F[0]}++}' FILE

Answer

使用 awk ：

输入文件

shell 中的 awk ：

$ awk '
    NR>1{
        arr[$1]   += $2
        count[$1] += 1
    }
    END{
        for (a in arr) {
            print "id avg " a " = " arr[a] / count[a]
        }
    }
' FILE

或者在 shell 中使用 Perl ：

$ perl -lane '
    END {
        foreach my $key (keys(%hash)) {
            print "id avg $key = " . $hash{$key} / $count{$key};
        }
    }
    if ($. > 1) {
        $hash{$F[0]}  += $F[1];
        $count{$F[0]} += 1;
    }
' FILE

输出是：

id avg 601 = 24
id avg 510 = 64.4

最后一个笑话是 Perl 黑暗混淆的单行 =)

perl -lane'END{for(keys(%h)){print"$_:".$h{$_}/$c{$_}}}($.>1)&&do{$h{$F[0]}+=$F[1];$c{$F[0]}++}' FILE

Question 2

#!/usr/bin/perl
use strict;
use warnings;

my %sum_so_far;
my %count_so_far;
while ( <> ) {
    # Skip lines that don't start with a digit
    next if m/^[^\d]/;

    # Accumulate the sum and the count
    my @line = split();
    $sum_so_far{$line[0]}   += $line[1];
    $count_so_far{$line[0]} += 1;
}

# Dump the output
print "Id Avg.ht\n";
foreach my $id ( keys %count_so_far ) {
    my $avg = $sum_so_far{$id}/$count_so_far{$id};
    print " $id $avg\n";
}

输出：

ire@localhost$ perl make_average.pl input.txt 
Id Avg.ht
 510 64.4
 601 24

请注意，您的示例输出是错误的。当该 id 的每个值都为 59 或更大时，您不可能获得平均值 52。

此外，您的其中一列中有一封字母l，伪装成数字1......

Answer

#!/usr/bin/perl
use strict;
use warnings;

my %sum_so_far;
my %count_so_far;
while ( <> ) {
    # Skip lines that don't start with a digit
    next if m/^[^\d]/;

    # Accumulate the sum and the count
    my @line = split();
    $sum_so_far{$line[0]}   += $line[1];
    $count_so_far{$line[0]} += 1;
}

# Dump the output
print "Id Avg.ht\n";
foreach my $id ( keys %count_so_far ) {
    my $avg = $sum_so_far{$id}/$count_so_far{$id};
    print " $id $avg\n";
}

输出：

ire@localhost$ perl make_average.pl input.txt 
Id Avg.ht
 510 64.4
 601 24

请注意，您的示例输出是错误的。当该 id 的每个值都为 59 或更大时，您不可能获得平均值 52。

此外，您的其中一列中有一封字母l，伪装成数字1......

Question 3

和gnu datamash:

datamash -H -s -g 1 mean 2 <file

GroupBy(Id) 平均值()
510 64.4
601 24

这通过st 字段计算nd 字段值进行s排序和分组，保留标题。它假设字段由单个制表符分隔。如果它们由多个空格分隔或定义另一个字段分隔符（空格、逗号等），则使用。由于需要排序输入，因此输出将按分组列排序。g12meanH-W, --whitespace-t, --field-separator=datamash

Answer

和gnu datamash:

datamash -H -s -g 1 mean 2 <file

GroupBy(Id) 平均值()
510 64.4
601 24

这通过st 字段计算nd 字段值进行s排序和分组，保留标题。它假设字段由单个制表符分隔。如果它们由多个空格分隔或定义另一个字段分隔符（空格、逗号等），则使用。由于需要排序输入，因此输出将按分组列排序。g12meanH-W, --whitespace-t, --field-separator=datamash

Question 4

看看这里做了什么：http://www.sugihartono.com/programming/group-by-count-and-sorting-using-perl-script/

最困难的部分是进行“group by”操作。链接脚本使用哈希来实现这一点。

在该链接中，他们正在计算总和，但获得平均值不会有太大不同。

Answer

看看这里做了什么：http://www.sugihartono.com/programming/group-by-count-and-sorting-using-perl-script/

最困难的部分是进行“group by”操作。链接脚本使用哈希来实现这一点。

在该链接中，他们正在计算总和，但获得平均值不会有太大不同。

第一列相同的平均行

答案1

答案2

答案3

答案4

相关内容