如何查找文本文件中排除一个用户给定单词的单词数

如何查找文本文件中排除一个用户给定单词的单词数

我有一大堆文本文件。其中,每篇文章均以 分隔15 stopwords。我想找出该文件中不包括的总字数stopword

答案1

使用 GNU grep

grep -Eo '\S+' < file | grep -vcxF stopword

会计算 ( -c) 单词的数量(与单词至少在有效文本上,它是不完全是 ( )的wc -w非空格字符 ( \S+))序列。-v-xFstopword

答案2

中的单词数input减去stopwords 的数量(使用GNU grep 的-o,因为您标记了 Linux):

echo $(( $(wc -w < input) - $( grep -o stopword input | wc -l ) ))

输入示例:

I have the large set of the text file. In that, each article is separated by 15 stopwords. I want to find out the total number of words count in that file excluding the stopword.
stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword
I have the large set of the text file. In that, each article is separated by 15 stopwords. I want to find out the total number of words count in that file excluding the stopword.

输出:

$ echo $(( $(wc -w < input) - $( grep -o stopword input | wc -l ) ))
66

答案3

awk '{ gsub("stopword",""); words+=NF }; END { print words; }' /text/file

这会计算所有awk涉及字段的内容。即使它在语义上不是一个像这样的词

  • 连字符
  • 空格后加一个点(句子结尾错误。下一个句子)
  • 标题中的数字(1.简介)

相关内容