如何分组并计算文件中包含特定字符串的所有行

2024-10-12 • tag-icon

我想从包含的文件中过滤所有行mySearchString，然后将它们分组并进行计数。

示例查找包含以下内容的所有行9791

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
AB-0001___Foo

使用 $ grep "9791" myFile.txt给出这个结果

AB-9791___Foo
AB-9791___Foo
DE-9791___Bar 
// 0001 was filtered out

这个结果应该Group by Count像这样分组和计数（像 SQL 一样）

AB-9791___Foo     2
DE-9791___BAR     1

这答案使用 perl但我们的机器上没有安装 perl。

什么工具有用（grep，awk，sed或其他）来实现第二部分分组和计数？

更新测试记录

在我的测试文件中，Test_2.txt这些行被写入

AB-9791___Foo
DE-9791___Bar
AB-0001___Foo
AB-9791___Foo
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
DE-9791___Bar
DE-9791___Bar

我复制并粘贴了每一AB-9791___Foo行，所以它们应该是相同的。运行后$ grep '9791' Test_grep_uniq_sort.txt | uniq -c得到了这个结果

  1     AB-9791___Foo
  1     DE-9791___Bar // expected: 4 actual: 1, 2, 1
  3     AB-9791___Foo // expected: 4 actual: 1, 3
  2     DE-9791___Bar
  1     DE-9791___Bar

运行$ sort Test_2.txt > Test_2_sort_0.txt然后使用grep | uniq几乎Test_2_sort_0.txt确实返回了预期的输出。

  $ grep '9791' Test_2_sort_0.txt | uniq -c
  4     AB-9791___Foo
  1     DE-9791___Bar // this is due to a missing line break / line feed
  3     DE-9791___Bar

手动添加换行符/换行后，一切正常

答案1

您必须sort先归档。

你可以像grep这样使用uniq：

 grep '9791' file1 | uniq -c
      2 AB-9791___Foo
      1 DE-9791___Bar

答案2

uniq -c对于计数和awk交换列：

$ uniq -c <<END | awk '{print $2 " " $1;}'
AB-9791___Foo
AB-9791___Foo
DE-9791___Bar
END

AB-9791___Foo 2
DE-9791___Bar 1

这里有一些想法：https://stackoverflow.com/questions/8627014/count-number-of-similar-lines-in-a-file

更新测试记录

答案1

答案2

相关内容