总结句子

总结句子

我有数据,我想总结句子以得出结论。下面的例子与数据无关,只是为了澄清这个想法,以便我可以复制它。

Employee Suzie signed one time.
Employee Dan signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzie signed one time.
Employee Harold signed one time.
Employee Sebastian signed one time.
Employee Jordan signed one time.
Employee Suzie signed one time.
Employee Suzan signed one time.

我想对这些句子做一个总结,如下所示:

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

我玩过awk,但似乎很难做到。然后我尝试了sed,但没有成功。似乎sed只是为了发现和改变事物。

答案1

一般方法是

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%s signed %d time(s)\n", name, count[name])
       }' <file
Harold signed 1 time(s)
Dan signed 1 time(s)
Sebastian signed 1 time(s)
Suzie signed 4 time(s)
Jordan signed 2 time(s)
Suzan signed 1 time(s)

即,使用关联数组/散列来存储特定名称被看到的次数。在END块中,迭代所有名称并打印每个名称的摘要。

为了获得更好的格式,请将调用%s中的占位符更改printf()%-10s为名称保留 10 个字符(左对齐)。

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%-10s signed %d time(s)\n", name, count[name])
       }' <file
Harold     signed 1 time(s)
Dan        signed 1 time(s)
Sebastian  signed 1 time(s)
Suzie      signed 4 time(s)
Jordan     signed 2 time(s)
Suzan      signed 1 time(s)

更多地摆弄输出(因为我很无聊):

$ awk '{ count[$2]++ }
       END {
           for (name in count)
               printf("%-10s signed %d time%s\n", name, count[name],
                      count[name] > 1 ? "s" : "" )
       }' <file
Harold     signed 1 time
Dan        signed 1 time
Sebastian  signed 1 time
Suzie      signed 4 times
Jordan     signed 2 times
Suzan      signed 1 time

答案2

awk使用关联的数组并且这将受到您拥有的内存大小的限制时,您可以执行以下操作:

sort -k2,2 infile | uniq -c

或者根据需要进行格式化:

sort -k2,2 infile  |uniq -c |awk '{ print $3, "signed", $1, "time(s)" }'

答案3

这份工作是为了awk.你需要一个array[index]来做到这一点:

awk 'NF {name[$2]++} END{for (each in name) {print each " signed " name[each] " time(s)"}}' file

Jordan signed 2 time(s)
Dan signed 1 time(s)
Suzie signed 4 time(s)
Suzan signed 1 time(s)
Sebastian signed 1 time(s)
Harold signed 1 time(s)

NF就是去掉多余的空行。数据存储在数组的索引和值中。值通过相应的索引引用。

答案4

我尝试了一个“for”解决方案,尽管我确信这可以重新编辑并变得很漂亮。不过还是达到了目的。

for name in $(awk '{print $2}' x.txt)
do
count=$(grep -i $i x.txt|wc -l)
echo "$i signed in $count times" >>xy.txt
done

sort -u xy.txt

Dan signed in 3 times
Harold signed in 1 times
Jordan signed in 2 times
Sebastian signed in 1 times
Suzan signed in 1 times
Suzie signed in 4 times

相关内容