grep 并打印文件 1 中的模式在文件 2 中出现的次数

Question 1

最简单的方法是对grep每个模式进行计数，然后对它们进行计数：

$ grep -Fwf file1 file2 | sort | uniq -c
      3 Fatty_acid_degradation

这些grep选项是-f提供一个文件作为要搜索的模式列表，-F指定该模式应被视为字符串而不是正则表达式，并-w确保该模式仅与整个单词匹配（因此regulation_of_expression不匹配）upregulation_of_excpression例如反对）。

然后，您可以使用您喜欢的任何工具来更改格式：

$ grep -Fwf file1 file2 | sort | uniq -c | sed -r 's/.*([0-9]+) *(.*)/\2\t\1/'
$ grep -Fwf file1 file2 | sort | uniq -c | perl -lane 'print "$F[1]\t$F[0]"'
$ grep -Fwf file1 file2 | sort | uniq -c | awk -vOFS="\t" '{print $2,$1}'

以上全部返回

Fatty_acid_degradation  3

Answer

最简单的方法是对grep每个模式进行计数，然后对它们进行计数：

$ grep -Fwf file1 file2 | sort | uniq -c
      3 Fatty_acid_degradation

这些grep选项是-f提供一个文件作为要搜索的模式列表，-F指定该模式应被视为字符串而不是正则表达式，并-w确保该模式仅与整个单词匹配（因此regulation_of_expression不匹配）upregulation_of_excpression例如反对）。

然后，您可以使用您喜欢的任何工具来更改格式：

$ grep -Fwf file1 file2 | sort | uniq -c | sed -r 's/.*([0-9]+) *(.*)/\2\t\1/'
$ grep -Fwf file1 file2 | sort | uniq -c | perl -lane 'print "$F[1]\t$F[0]"'
$ grep -Fwf file1 file2 | sort | uniq -c | awk -vOFS="\t" '{print $2,$1}'

以上全部返回

Fatty_acid_degradation  3

Question 2

grep -f file1 file2 | sort | uniq -c

这给出了以下格式的输出：

  3 Fatty_acid_degradation

你能忍受吗？

Answer

grep -f file1 file2 | sort | uniq -c

这给出了以下格式的输出：

  3 Fatty_acid_degradation

你能忍受吗？

Question 3

这么多的快速回答，让我感到尴尬......

awk 'FNR == NR { pat[$1]=0 ; next ; }
{ if ( $0 in pat ) pat[$0]++ ; }
END { for ( p in pat ) if ( pat[p]) printf "%s %d\n",p,pat[p] ;}' f1 f2

在哪里

FNR == NR { pat[$1]=0 ; next ; }在 pat 数组中记录模式
{ if ( $0 in pat ) pat[$0]++ ; }每当有一个匹配时，计数
END { for ( p in pat ) if ( pat[p]) printf "%s %d\n",p,pat[p] ;}最后，转储非零计数

Answer

这么多的快速回答，让我感到尴尬......

awk 'FNR == NR { pat[$1]=0 ; next ; }
{ if ( $0 in pat ) pat[$0]++ ; }
END { for ( p in pat ) if ( pat[p]) printf "%s %d\n",p,pat[p] ;}' f1 f2

在哪里

FNR == NR { pat[$1]=0 ; next ; }在 pat 数组中记录模式
{ if ( $0 in pat ) pat[$0]++ ; }每当有一个匹配时，计数
END { for ( p in pat ) if ( pat[p]) printf "%s %d\n",p,pat[p] ;}最后，转储非零计数

Question 4

您还可以使用 Python 尝试以下解决方案：

#!/usr/bin/env python2
import collections
with open('file_1') as f1, open('file_2') as f2:
    counts = collections.Counter(f2)
    for line in f1:
        if line in counts:
            print line.rstrip() + '\t' + str(counts[line])

这里我们使用了模块Counter的类collections，它将生成一个字典，其中包含可迭代的每个元素的出现次数。

Answer

您还可以使用 Python 尝试以下解决方案：

#!/usr/bin/env python2
import collections
with open('file_1') as f1, open('file_2') as f2:
    counts = collections.Counter(f2)
    for line in f1:
        if line in counts:
            print line.rstrip() + '\t' + str(counts[line])

这里我们使用了模块Counter的类collections，它将生成一个字典，其中包含可迭代的每个元素的出现次数。

grep 并打印文件 1 中的模式在文件 2 中出现的次数

答案1

答案2

答案3

答案4

相关内容