使用 grep 查找确切的字符串

2024-5-19 • tag-icon

使用 grep 查找确切的字符串

举例来说，我有一个包含许多电子邮件地址的大文本文件，使用 bash 我需要搜索/验证电子邮件是否存在（或不存在）。应该（仅）使用“锚点”吗？

grep '^[email protected]' text_file

或者有更好的方法吗？我需要创建一个 bash 脚本并且我希望安全。

答案1

请参阅-F（固定字符串，而不是正则表达式）和-x（精确：匹配整行）选项。

grep -Fx [email protected] text_file

相当于：

grep '^user1@example\.com$' text_file

（请记住，这.是匹配任何字符的正则表达式运算符）。

-q如果您只想检查是否存在这样的行，请使用该选项：

grep -Fxq [email protected] text_file &&
  echo yes, that address is in that file.

如果要搜索的行和文件名是可变的：

grep -Fxqe "$email" < "$file"

或者

grep -Fxq -- "$email" < "$file"

你不想要：

grep -Fxq "$email" "$file"

$email因为如果或$file开始的话会导致问题-。

如果文件已排序（在您当前的区域设置中，最好是），您可以通过使用而不是C加快速度：commgrep

printf '%s\n' [email protected] | comm -12 - text_file

当您有多个电子邮件地址需要检查时（例如在另一个排序的文件中），优势将变得更加明显：

comm -12 text_file emails_to_check

会比以下更快：

grep -Fxf emails_to_check text_file

答案2

为了尽可能高效，您希望在找到第一个匹配项后停止。如果你有 GNU grep，你可以这样做：

grep -m 1 '^user1@example\.com$' your_file

如果你不这样做，你可以使用 Perl：

perl -nlE 'say and last if $_ eq q{[email protected]}' your_file

答案3

那里有很多电子邮件检查。其中之一是：

grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" text_file

详细说明我的答案。

您正在使用^指示字符串开头的锚点。如果电子邮件地址位于长字符串之间，则这将不匹配。

答案4

考虑一般文字/精确字符串匹配：

grep -w "search_word" <file>  >  output.txt

#\b shows boundaries over here.

或者，

 grep  "\bsearch_word\b"  <file>  >  output.txt

相关内容