当第一列中的 ID 相同时,我想计算数据的平均值。我想计算第一列中具有相同值的其余列的平均值。我知道如何对一列进行计算,但我很难对整个列进行平均。
这是我的示例数据文件名:abraham
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 7.5 0 7.78 3.695 1.139 0.186 -0.074 -0.205 -0.323 -0.434 -0.539 -0.639 -0.735 -0.916 -1.085 -1.241 -1.387 -1.522 -1.818 -2.058 -2.247 -2.389 -2.488 -2.551 -2.582 -2.521 -2.404 -2.262 -2.118 -1.987 -1.872 -1.775 -1.691 -1.628 -1.578 -1.484 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 6.11 0 6.112 3.536 1.227 0.252 -0.063 -0.201 -0.32 -0.43 -0.535 -0.634 -0.73 -0.91 -1.078 -1.234 -1.379 -1.512 -1.806 -2.046 -2.236 -2.378 -2.478 -2.54 -2.573 -2.515 -2.402 -2.262 -2.119 -1.988 -1.873 -1.776 -1.692 -1.629 -1.578 -1.484 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 2.22 0 2.227 2.542 1.022 0.222 -0.058 -0.198 -0.317 -0.427 -0.531 -0.63 -0.725 -0.905 -1.072 -1.227 -1.371 -1.505 -1.798 -2.038 -2.227 -2.37 -2.471 -2.533 -2.567 -2.512 -2.4 -2.262 -2.12 -1.989 -1.874 -1.776 -1.693 -1.63 -1.579 -1.485 -1.395 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1929 2.78 0 2.779 1.993 0.819 0.18 -0.056 -0.195 -0.314 -0.424 -0.527 -0.626 -0.721 -0.9 -1.066 -1.221 -1.364 -1.498 -1.79 -2.03 -2.219 -2.362 -2.464 -2.526 -2.562 -2.509 -2.399 -2.262 -2.12 -1.99 -1.875 -1.777 -1.693 -1.63 -1.579 -1.485 -1.396 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 0.83 0 0.833 0.85 0.354 0.073 -0.052 -0.187 -0.305 -0.414 -0.517 -0.614 -0.708 -0.885 -1.049 -1.202 -1.344 -1.477 -1.768 -2.006 -2.196 -2.34 -2.442 -2.508 -2.549 -2.5 -2.395 -2.261 -2.122 -1.992 -1.877 -1.779 -1.695 -1.631 -1.58 -1.485 -1.396 -1.311 -1.23 -1.151 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 -0.28 0 -0.278 0.461 0.242 0.049 -0.052 -0.185 -0.302 -0.411 -0.513 -0.61 -0.704 -0.88 -1.044 -1.196 -1.338 -1.47 -1.761 -1.999 -2.188 -2.332 -2.435 -2.502 -2.544 -2.498 -2.394 -2.261 -2.122 -1.992 -1.878 -1.779 -1.695 -1.631 -1.581 -1.486 -1.396 -1.312 -1.23 -1.152 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 0.56 0.002 0.511 0.348 0.165 0.033 -0.052 -0.183 -0.299 -0.407 -0.509 -0.607 -0.7 -0.875 -1.038 -1.19 -1.332 -1.463 -1.753 -1.991 -2.18 -2.325 -2.429 -2.496 -2.54 -2.495 -2.393 -2.261 -2.123 -1.993 -1.878 -1.78 -1.696 -1.632 -1.581 -1.486 -1.396 -1.312 -1.23 -1.152 -1.076 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 3.89 0 3.884 0.96 0.243 0.041 -0.052 -0.181 -0.297 -0.404 -0.506 -0.603 -0.696 -0.87 -1.033 -1.184 -1.325 -1.457 -1.746 -1.983 -2.173 -2.318 -2.422 -2.49 -2.536 -2.492 -2.391 -2.26 -2.123 -1.994 -1.879 -1.78 -1.696 -1.632 -1.581 -1.486 -1.397 -1.312 -1.23 -1.152 -1.076 -1.002 -0.86 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.558
1931 -7.22 0.057 -4.563 -0.299 0.035 0.007 -0.054 -0.175 -0.286 -0.391 -0.49 -0.585 -0.676 -0.847 -1.007 -1.156 -1.295 -1.425 -1.711 -1.947 -2.136 -2.282 -2.389 -2.46 -2.514 -2.478 -2.384 -2.259 -2.125 -1.996 -1.882 -1.783 -1.698 -1.633 -1.582 -1.487 -1.397 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -6.11 0.057 -4.1 -0.663 0.004 -0.001 -0.055 -0.174 -0.284 -0.388 -0.487 -0.581 -0.672 -0.843 -1.002 -1.151 -1.289 -1.419 -1.704 -1.94 -2.129 -2.275 -2.382 -2.454 -2.51 -2.476 -2.383 -2.258 -2.125 -1.997 -1.883 -1.783 -1.698 -1.633 -1.583 -1.487 -1.397 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -10.28 0.057 -6.362 -1.527 0 -0.001 -0.055 -0.173 -0.283 -0.386 -0.484 -0.578 -0.668 -0.839 -0.997 -1.145 -1.283 -1.413 -1.697 -1.933 -2.122 -2.268 -2.376 -2.448 -2.506 -2.473 -2.382 -2.258 -2.125 -1.998 -1.883 -1.784 -1.698 -1.633 -1.583 -1.487 -1.398 -1.312 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
1931 -10 0.057 -7.018 -2.259 -0.001 -0.001 -0.056 -0.172 -0.281 -0.384 -0.481 -0.575 -0.665 -0.834 -0.992 -1.14 -1.278 -1.406 -1.691 -1.926 -2.115 -2.261 -2.369 -2.443 -2.501 -2.47 -2.38 -2.258 -2.125 -1.998 -1.884 -1.784 -1.699 -1.634 -1.583 -1.487 -1.398 -1.313 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.588 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
期望的输出
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931 -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
答案1
这是一个 Perl 解决方案。它并不期望对年份进行排序,但确实需要将整个文件(至少是每列的总和)读入内存。只要你的文件不是太大,它应该做你想要的:
#!/usr/bin/env perl
## Read the input file, line by line
while (<>){
## Print and skip the 1st line
if($.==1){
print;
next
};
## Split the fileds into the array @F
@F=split(/\s+/);
## $k (%k) is a hash of hashes. The first element of
## $F ($F[0]) is the year, so $k{$year}{tot} will be
## the nuber of lines for each year.
$k{$F[0]}{tot}++;
## Go through each field add its value
## to $k{$year}{field number}
for($i=1;$i<=$#F;$i++){
$k{$F[0]}{$i}+=$F[$i];
}
}
## Now print the data for each year
foreach $y (sort keys(%k)){
## $y is the year
print "$y ";
## $k{$y}{$i} is the sum of the values for this column/year
## and $k{$y}{tot} is the number of lines for this year
for($i=1;$i<=$#F;$i++){
print " " . $k{$y}{$i}/$k{$y}{tot};
}
print "\n";
}
这是与(较长的)单行相同的脚本:
$ perl -ane '$.==1 && do {print; next}; $k{$F[0]}{tot}++;
for($i=1;$i<=$#F;$i++){$k{$F[0]}{$i}+=$F[$i];}
END{foreach $y (sort keys(%k)){
print "$y ";
for($i=1;$i<=$#F;$i++){
print " " . $k{$y}{$i}/$k{$y}{tot};
} print "\n";} }' abraham
year 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
1929 4.6525 0 4.7245 2.9415 1.05175 0.21 -0.06275 -0.19975 -0.3185 -0.42875 -0.533 -0.63225 -0.72775 -0.90775 -1.07525 -1.23075 -1.37525 -1.50925 -1.803 -2.043 -2.23225 -2.37475 -2.47525 -2.5375 -2.571 -2.51425 -2.40125 -2.262 -2.11925 -1.9885 -1.8735 -1.776 -1.69225 -1.62925 -1.5785 -1.4845 -1.39525 -1.311 -1.229 -1.151 -1.075 -1.002 -0.859 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.559
1930 1.25 0.0005 1.2375 0.65475 0.251 0.049 -0.052 -0.184 -0.30075 -0.409 -0.51125 -0.6085 -0.702 -0.8775 -1.041 -1.193 -1.33475 -1.46675 -1.757 -1.99475 -2.18425 -2.32875 -2.432 -2.499 -2.54225 -2.49625 -2.39325 -2.26075 -2.1225 -1.99275 -1.878 -1.7795 -1.6955 -1.6315 -1.58075 -1.48575 -1.39625 -1.31175 -1.23 -1.15175 -1.076 -1.002 -0.85925 -0.721 -0.587 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.308 0.445 0.55875
1931 -8.4025 0.057 -5.51075 -1.187 0.0095 0.001 -0.055 -0.1735 -0.2835 -0.38725 -0.4855 -0.57975 -0.67025 -0.84075 -0.9995 -1.148 -1.28625 -1.41575 -1.70075 -1.9365 -2.1255 -2.2715 -2.379 -2.45125 -2.50775 -2.47425 -2.38225 -2.25825 -2.125 -1.99725 -1.883 -1.7835 -1.69825 -1.63325 -1.58275 -1.487 -1.3975 -1.31225 -1.231 -1.152 -1.076 -1.003 -0.86 -0.722 -0.58725 -0.456 -0.328 -0.171 -0.017 0.068 0.151 0.307 0.444 0.558
答案2
您可以使用数组来存储每列值的总和,当 $1 发生变化时,从那里获取平均值。
# print header
NR==1 {
print
next
}
# print average of each column per year
# then, reset columns sums and number of lines
function print_sum() {
printf prev
# needs GNU awk, for length of array
for (i=2; i < length(sum) + 2; i++) {
printf FS sum[i]/nlines
sum[i] = 0
}
printf ORS
nlines = 0
}
# print average when $1 changes, but not the first time
# also, on end of script
NR>2 && prev!=$1 { print_sum() }
END { print_sum() }
# for every line with the same $1, sum column values, increment number of lines
{
prev=$1;
nlines++
for (i=2; i <= NF; i++) {
sum[i]+=$i
}
}
答案3
和gnu
datamash
:
{ head -n 1; datamash -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
其中54
是列数。这假设您的字段是制表符分隔的。如果它们是空间分隔的运行
{ head -n 1; datamash -t ' ' -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
或者如果它们由多个空格(空格/制表符)分隔:
{ head -n 1; datamash -W -s -g 1 $(printf 'mean %s ' {2..54}); } <infile
请注意,输出将按第一列排序。