如果我 grep 一个包含以下内容的文件:
These are words
These are words
These are words
These are words
...对于单词These
,它将打印字符串These are words
四次。
如何防止 grep 多次打印重复的字符串?否则,我该如何操作 grep 的输出来删除重复的行?
答案1
Unix 的理念是让工具只做一件事,并把它做好。在本例中,grep
是从文件中选择文本的工具。要找出是否有重复项,可以对文本进行排序。要删除重复项,可以使用 选项-u
。sort
因此:
grep These filename | sort -u
sort
有很多选项:请参阅man sort
。如果您想要计算重复项,或者有一个更复杂的方案来确定哪些是重复项,哪些不是重复项,则将排序输出通过管道传输到uniq
: grep These filename | sort | uniq
并参阅man
uniq` 了解选项。
答案2
grep
如果您只查找单个字符串,请使用附加开关
grep -m1 'These' filename
从man grep
-m NUM, --max-count=NUM
Stop reading a file after NUM matching lines. If the input is
standard input from a regular file, and NUM matching lines are
output, grep ensures that the standard input is positioned to
just after the last matching line before exiting, regardless
of the presence of trailing context lines. This enables a calling
process to resume a search. When grep stops after NUM matching
lines, it outputs any trailing context lines. When the -c or
--count option is also used, grep does not output a count greater
than NUM. When the -v or --invert-match option is also used, grep
stops after outputting NUM non-matching lines.
或使用awk
;)
awk '/These/ {print; exit}' foo