我有一个具有以下结构的文件:
Ti 1.9699858320 2.0810775390 4.162155079 5.20200
O 1.6428341970 2.0810775390 4.162155079 -2.14259
O 1.6428341970 2.0810775390 4.162155079 -2.14259
Pb 4.1621550790 4.1621550790 4.192557641 3.39279
O 3.7662066970 4.1621550790 4.192557641 -4.29652
Ti 6.1302323500 6.2584338990 4.192557641 5.23841
O 5.8163744340 6.2584338990 4.192557641 -2.13267
O 5.8163744340 6.2584338990 4.192557641 -2.13267
Pb 8.3547127200 8.3547127200 4.196295567 3.40984
O 7.9266344100 8.3547127200 4.196295567 -4.36260
Ti 10.318243871 10.452860504 4.196295567 5.26652
O 9.9935741680 10.452860504 4.196295567 -2.13625
O 9.9935741680 10.452860504 4.196295567 -2.13625
Pb 12.551008287 12.551008287 4.193631562 3.43289
O 12.112224767 12.551008287 4.193631562 -4.38552
我需要进行以下操作:
- 从第 2 列减去第 3 列)
将 1) 的结果与第 5 列相乘:我这样做:
awk '{print $0," ",($2-$3)*$5 > "file-out.dat"}' file-in.dat
(这是棘手的部分)结果为 2) 我需要得到每组 5 个条目的总和。以下是第 2) 点之后文件的外观。我需要以 5 为一组添加最后一列中的条目,并将结果写入如下:
Ti 1.9699858320 2.0810775390 4.162155079 5.20200 -0.577899 1 result_of_sum_of_first_group_of_5
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976 2 result_of_sum_of_second_group_of_5
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976 3 result_of_sum_of_third_group_of_5
Pb 4.1621550790 4.1621550790 4.192557641 3.39279 0
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012
Ti 6.1302323500 6.2584338990 4.192557641 5.23841 -0.671572
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
Pb 8.3547127200 8.3547127200 4.196295567 3.40984 0
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753
Ti 10.318243871 10.452860504 4.196295567 5.26652 -0.708961
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
Pb 12.551008287 12.551008287 4.193631562 3.43289 0
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429
有没有一种方法可以在一行 awk 中完成所有这些操作?
答案1
使用两个临时文件分两步:
第一步:创建正好包含六列的中间文件,并将所有三组tmpfile1
的总和创建为:Ti
tmpfile2
awk '{ $6 = ($2 - $3)*$5; print }' OFS="\t" file | tee tmpfile1 |
awk '$1 == "Ti" && NR > 1 { print ++i, sum; sum = 0 } { sum += $6 } END { print ++i, sum }' OFS="\t" >tmpfile2
第一个awk
命令只是添加第六列,其中包含根据公式计算的值。将tee
结果写入tmpfile1
并将数据传递给第二个awk
程序。
第二个awk
总结了新的第六栏。当它到达一行时Ti
,除非它是文件的第一行,否则它会输出当前总和并重置变量sum
。最后一组行的总和在END
块中输出。变量i
在每次输出之前递增,并且是您想要在该列中的索引。这将创建该tmpfile2
文件。
tmpfile1
:
Ti 1.9699858320 2.0810775390 4.162155079 5.20200 -0.577899
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976
Pb 4.1621550790 4.1621550790 4.192557641 3.39279 0
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012
Ti 6.1302323500 6.2584338990 4.192557641 5.23841 -0.671572
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
Pb 8.3547127200 8.3547127200 4.196295567 3.40984 0
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753
Ti 10.318243871 10.452860504 4.196295567 5.26652 -0.708961
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
Pb 12.551008287 12.551008287 4.193631562 3.43289 0
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429
tmpfile2
:
1 3.00125
2 3.08149
3 3.17763
第二步: 将这些粘贴在一起:
paste tmpfile1 tmpfile2
这会产生
Ti 1.9699858320 2.0810775390 4.162155079 5.20200 -0.577899 1 3.00125
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976 2 3.08149
O 1.6428341970 2.0810775390 4.162155079 -2.14259 0.938976 3 3.17763
Pb 4.1621550790 4.1621550790 4.192557641 3.39279 0
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012
Ti 6.1302323500 6.2584338990 4.192557641 5.23841 -0.671572
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
Pb 8.3547127200 8.3547127200 4.196295567 3.40984 0
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753
Ti 10.318243871 10.452860504 4.196295567 5.26652 -0.708961
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
Pb 12.551008287 12.551008287 4.193631562 3.43289 0
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429
结果以制表符分隔。
答案2
这是一个纯粹的awk
方法:
$ awk 'BEGIN{c=0}
{
$6 = ($2 - $3)*$5;
a[NR]=$0;
sum+=$6
if(NR%5==0){
a[++c]=$0" "sum;
sum=0;
}
}
END{
for(i in a){
print a[i]
}
}' file
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012 3.00125
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753 3.0815
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429 3.17763
Pb 4.1621550790 4.1621550790 4.192557641 3.39279 0
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012
Ti 6.1302323500 6.2584338990 4.192557641 5.23841 -0.671572
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
Pb 8.3547127200 8.3547127200 4.196295567 3.40984 0
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753
Ti 10.318243871 10.452860504 4.196295567 5.26652 -0.708961
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
Pb 12.551008287 12.551008287 4.193631562 3.43289 0
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429
请注意,如果您的输入文件是制表符分隔的(似乎是这种情况),这将删除制表符。如果这是一个问题,你可以用以下命令将它们放回去sed
:
$ awk '...' | sed 's/ /\t/g'
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012 3.00125
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753 3.0815
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429 3.17763
Pb 4.1621550790 4.1621550790 4.192557641 3.39279 0
O 3.7662066970 4.1621550790 4.192557641 -4.29652 1.7012
Ti 6.1302323500 6.2584338990 4.192557641 5.23841 -0.671572
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
O 5.8163744340 6.2584338990 4.192557641 -2.13267 0.942767
Pb 8.3547127200 8.3547127200 4.196295567 3.40984 0
O 7.9266344100 8.3547127200 4.196295567 -4.36260 1.86753
Ti 10.318243871 10.452860504 4.196295567 5.26652 -0.708961
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
O 9.9935741680 10.452860504 4.196295567 -2.13625 0.98115
Pb 12.551008287 12.551008287 4.193631562 3.43289 0
O 12.112224767 12.551008287 4.193631562 -4.38552 1.92429