是否可以用来awk
计算平均的每行的(每行有不同的列)。我有一个如下所示的文件,第一列是名称,我喜欢计算每行的平均值并将结果打印在输入文件的最后一列中:
输入文件( data1.csv
):
EMPLOYEE1,0.395314,0.384513,,
EMPLOYEE2,5.4908,5.2921,,
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486
EMPLOYEE9,33.5195,31.9736,33.6779,31.742
期望的输出:
EMPLOYEE1,0.395314,0.384513,,,0.3899135
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91E-06
EMPLOYEE7,3.72E-06,3.87E-06,3.94E-06,3.72E-06,3.82E-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282
我尝试了 awk
如下操作,但它不会计算列小于最大 NF 的行的平均值。
awk -F',' '{ s = 0; for (i = 2; i <= NF; i++) s += $i; print $1, (NF > 1) ? s / (NF - 1) : 0; }' data1.csv
和
awk -F',' '{sum=0; for (i=2;i<=NF;i++)sum+=$i; print $0,sum/(NF-1)}' data1.csv
但我的代码没有改变 NF 行。是否可以更改每一行的 NF 并获得所需的输出?
答案1
这是一种方法:
$ awk -F',' -v OFS=',' '{
s=0;
numFields=0;
for(i=2; i<=NF;i++){
if(length($i)){
s+=$i;
numFields++
}
}
print $0, (numFields ? s/numFields : 0)}' data1.csv
EMPLOYEE1,0.395314,0.384513,,,0.389914
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.7282
请注意,awk 打印的0.389914
结果0.779827/2
意味着第一行的平均值将是0.389914
,而不是0.389915
。这是因为 awk 会四舍五入到最接近的偶数,并且它的默认打印模式(由变量控制OFMT
)是%0.6g
。如果您需要更高的准确性,您可以这样做:
$ awk -F',' -v OFS=',' -v OFMT='%0.7g' '{
s=0;
numFields=0;
for(i=2; i<=NF;i++){
if(length($i)){
s+=$i;
numFields++
}
}
print $0, (numFields ? s/numFields : 0)}' data1.csv
EMPLOYEE1,0.395314,0.384513,,,0.3899135
EMPLOYEE2,5.4908,5.2921,,,5.39145
EMPLOYEE3,0.0002323,0.00022945,0.00023238,0.00022931,0.00023086
EMPLOYEE4,0.00335516,0.00328432,0.00340309,0.00327163,0.00332855
EMPLOYEE5,1.4816,1.4367,1.4854,1.4353,1.45975
EMPLOYEE6,7.89E-06,7.93E-06,7.95E-06,7.87E-06,7.91e-06
EMPLOYEE7,3.724E-06,3.8745E-06,3.9428E-06,3.7227E-06,3.816e-06
EMPLOYEE8,0.699498,0.688892,0.704256,0.683486,0.694033
EMPLOYEE9,33.5195,31.9736,33.6779,31.742,32.72825