我有一个这样的单词列表,用行分隔:
Range
Balance
Total
CombinList
CombinRow
GridKey
KeDanJia
AddRowNum
TopList
Keysearch
Keysearchtaobao
IsearchData
IsearchDataSep
...
我在目录下有一些文件:
$ tree
.
|-- a.txt
|-- b.txt
|-- c.txt
|-- d
| |-- a.txt
| |-- b.txt
| |-- c.txt
| |-- d.txt
| |-- e.txt
| |-- f.txt
| `-- g.txt
我如何计算这些文件中单词的出现次数?输出应如下所示:
Range: 0
Balance: 32
Total: 100
CombinList:4
CombinRow: 3
GridKey: 1
KeDanJia: 43
AddRowNum: 5
TopList: 34
Keysearch: 0
Keysearchtaobao: 1
IsearchData: 12
IsearchDataSep: 123
...
答案1
假设您的单词列表位于一个名为的文件中/path/to/words.txt
,并且您的树位于/tree
此示例的目的,请尝试:
find /tree -name '*.txt' -execdir sed 's/ /\
/g' {} + | grep -Fw -f /path/to/words.txt | sort | uniq -c | \
awk '{print $2 ": " $1}'
答案2
awk '
FILENAME == ARGV[1] {word[$0]=0; next}
{
for (i=1; i<=NF; i++) {
if ($i in word) word[$i]++
}
}
END {for (w in word) print w ": " word[w]}
' word.file $(find . -type f -print)