如何防止 grep 多次打印相同的字符串?

如何防止 grep 多次打印相同的字符串?

如果我 grep 一个包含以下内容的文件:

These are words
These are words
These are words
These are words

...对于单词These,它将打印字符串These are words四次。

如何防止 grep 多次打印重复的字符串?否则,我该如何操作 grep 的输出来删除重复的行?

答案1

Unix 的理念是让工具只做一件事,并把它做好。在本例中,grep是从文件中选择文本的工具。要找出是否有重复项,可以对文本进行排序。要删除重复项,可以使用 选项-usort因此:

grep These filename | sort -u

sort有很多选项:请参阅man sort。如果您想要计算重复项,或者有一个更复杂的方案来确定哪些是重复项,哪些不是重复项,则将排序输出通过管道传输到uniqgrep These filename | sort | uniq并参阅manuniq` 了解选项。

答案2

grep如果您只查找单个字符串,请使用附加开关

grep -m1 'These' filename

man grep

-m NUM, --max-count=NUM
        Stop reading a file after NUM matching lines.  If the input is
        standard input from a regular file, and NUM matching lines are
        output, grep ensures that the standard input is positioned  to
        just  after  the  last matching  line  before exiting, regardless
        of the presence of trailing context lines.  This enables a calling
        process to resume a search.  When grep stops after NUM matching
        lines, it outputs any trailing context lines.  When the -c or
        --count option is also used, grep does not output a count greater
        than NUM.  When the -v or --invert-match option is also used, grep
        stops after outputting NUM non-matching lines.

或使用awk ;)

awk '/These/ {print; exit}' foo

相关内容