我有一大堆文本文件。其中,每篇文章均以 分隔15 stopwords
。我想找出该文件中不包括的总字数stopword
答案1
使用 GNU grep
:
grep -Eo '\S+' < file | grep -vcxF stopword
会计算 ( -c
) 单词的数量(与单词至少在有效文本上,它是不完全是 ( )的wc -w
非空格字符 ( \S+
))序列。-v
-xF
stopword
答案2
中的单词数input
减去stopword
s 的数量(使用GNU grep 的-o
,因为您标记了 Linux):
echo $(( $(wc -w < input) - $( grep -o stopword input | wc -l ) ))
输入示例:
I have the large set of the text file. In that, each article is separated by 15 stopwords. I want to find out the total number of words count in that file excluding the stopword.
stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword stopword
I have the large set of the text file. In that, each article is separated by 15 stopwords. I want to find out the total number of words count in that file excluding the stopword.
输出:
$ echo $(( $(wc -w < input) - $( grep -o stopword input | wc -l ) ))
66
答案3
awk '{ gsub("stopword",""); words+=NF }; END { print words; }' /text/file
这会计算所有awk
涉及字段的内容。即使它在语义上不是一个像这样的词
- 连字符
- 空格后加一个点(句子结尾错误。下一个句子)
- 标题中的数字(1.简介)