我想计算多个文件中相同的单词,然后显示它们在哪个文件中。
文件1:
This is so beautiful
文件2:
There are so beautiful
文件3:
so beautiful
期望的输出1:
so:3
beautiful:3
期望的输出2:
so:
file1:1
file2:1
file3:1
beautiful:
file1:1
file2:1
file3:1
答案1
尝试这个,
# Declare the files you want to include
files=( file* )
# Function to find common words in any number of files
wcomm() {
# If no files provided, exit the function.
[ $# -lt 1 ] && return 1
# Extract words from first file
local common_words=$(grep -o "\w*" "$1" | sort -u)
while [ $# -gt 1 ]; do
# shift $1 to next file
shift
# Extract words from next file
local next_words=$(grep -o "\w*" "$1" | sort -u)
# Get only words in common from $common_words and $next_words
common_words=$(comm -12 <(echo "${common_words,,}") <(echo "${next_words,,}"))
done
# Output the words common to all input files
echo "$common_words"
}
# Output number of matches for each of the common words in total and per file
for w in $(wcomm "${files[@]}"); do
echo $w:$(grep -oiw "$w" "${files[@]}" | wc -l);
for f in "${files[@]}"; do
echo $f:$(grep -oiw "$w" "$f" | wc -l);
done;
echo;
done
输出:
beautiful:3
file1:1
file2:1
file3:1
so:3
file1:1
file2:1
file3:1
解释:
作为注释包含在脚本内。
特征:
- 文件数量与您的数量一样多ARG_MAX允许
- 查找由任何理解为单词分隔符分隔的所有单词
grep
。 - 忽略大小写,因此“beautiful”和“Beautiful”是同一个词。
答案2
试试这个代码。如果需要进行调整
bash-4.1$ cat test.sh
#!/bin/bash
OUTPUT_FILE=/tmp/output.txt
awk '{
for(i=1;i<=NF;i++)
{
Arr[$i]++
}
}
END{
for (i in Arr){
if(Arr[i]>1)
{
print i":"Arr[i]
}
}
}' file* > ${OUTPUT_FILE}
cat ${OUTPUT_FILE}
echo ""
IFS=":"
while read WORD TOTAL_COUNT
do
echo "${WORD}:"
for FILE_NAME in file*
do
COUNT=$(tr ' ' '\n' < ${FILE_NAME} | grep -c "${WORD}")
if [ "${COUNT}" -gt "0" ]
then
echo "${FILE_NAME}:${COUNT}"
fi
done
done < ${OUTPUT_FILE}
bash-4.1$ bash test.sh
beautiful:3
so:3
beautiful:
file1:1
file2:1
file3:1
so:
file1:1
file2:1
file3:1
答案3
用于grep
提供单词和文件名,然后awk
重新格式化输出以获得所需的结果:
grep -Ho '\w\+' file* |
awk -F':' '{ words[$1 FS $2]++; seen[$2]++ }
END{ for (x in seen) {
print x":" seen[x];
for (y in words) {
if (y ~ "\\<" x "\\>")print substr(y, 1, length(y)-length(x)), words[y]
}
}
}'
这将为您提供如下良好的输出(一次性获得所需的输出):
so:3
file1: 1
file2: 1
file3: 1
This:1
file1: 1
beautiful:3
file3: 1
file1: 1
file2: 1
There:1
file2: 1
are:1
file2: 1
is:1
file1: 1
答案4
如果你不想写代码,只想用快速的方式知道结果,你可以使用这个命令:
cat list_of_words | while read line; do echo $line; grep -riE '$line'-c where_to_look_or_folder; done
-r :read into files
-i: no casesensitive
-E: regexp is useable if you want something more complicated to search
-c: counter
输出:
word1
path:filename:count
例子:
cat text | while read line; do echo $line; grep -riE '$line'-c somwhwere/nowhere; done