我有一个包含以下输入数据的文件:
Sample1
Feature 1
A
B
C
D
Feature 2
E
F
G
Sample2:
Feature 1
H
I
Feature 2
L
O
P
我想要得到以下输出:
Sample1
Feature 1: 4
Feature 2: 3
Sample2
Feature 1: 2
Feature 2: 3
所以基本上我试图计算每个特征中有多少个元素,并且分别计算每个样本
我尝试使用以下命令:
awk '{if(/^\Feature/){n=$0;}else{l[n]++}}END{for(n in l){print n" : "l[n]}}' 输入文件 > 输出文件
但它基本上给了我以下输出(它计算了所有样本的所有特征)
Feature 1: 6
Feature 2: 6
有人可以帮我修改这个命令或者建议另一个命令吗?
答案1
文件summarize.awk
:
function print_feature() {
if (feature) print feature ": " n
n = 0
feature = ""
}
NF == 0 { # empty line.
print_feature() # print the feature summary
in_feature = 0 # we are no longer counting elements
next # do not print the empty line
}
$1 == "Feature" { # a new feature
print_feature() # print the previous feature summary
feature = $0 # save this as the new feature
in_feature = 1 # indicate we are counting elements
next # do not print ... yet
}
{
if (in_feature)
n++ # count this element
else # or
print # print (e.g. "Sample")
}
END {
print_feature() # if there is no trailing blank line, print the current feature
}
然后
$ awk -f summarize.awk file
Sample1
Feature 1: 4
Feature 2: 3
Sample2:
Feature 1: 2
Feature 2: 3