从文件中查找具有匹配整行的文件

Question 1

使用支持以下功能的 awk nextfile：

NR == FNR {
  a[++n]=$0; next
}
$0 != a[c+1] && (--c || $0!=a[c+1]) {
  c=0; next
}
++c >= n {
  print FILENAME; c=0; nextfile
}

用于find递归：

find dir -type f -exec gawk -f above.awk compromised_header.txt {} +

或者这可能有效：

pcregrep -rxlM "$( perl -lpe '$_=quotemeta' compromised_header.txt )" dir

使用 perl 转义元字符，因为 pcregrep 似乎不--fixed-strings与--multiline.

Perl 处于 slurp 模式时（不适用于太大而无法保存在内存中的文件）：

find dir -type f -exec perl -n0777E 'BEGIN {$f=<>} say $ARGV if /^\Q$f/m
' compromised_header.txt {} +

Answer

使用支持以下功能的 awk nextfile：

NR == FNR {
  a[++n]=$0; next
}
$0 != a[c+1] && (--c || $0!=a[c+1]) {
  c=0; next
}
++c >= n {
  print FILENAME; c=0; nextfile
}

用于find递归：

find dir -type f -exec gawk -f above.awk compromised_header.txt {} +

或者这可能有效：

pcregrep -rxlM "$( perl -lpe '$_=quotemeta' compromised_header.txt )" dir

使用 perl 转义元字符，因为 pcregrep 似乎不--fixed-strings与--multiline.

Perl 处于 slurp 模式时（不适用于太大而无法保存在内存中的文件）：

find dir -type f -exec perl -n0777E 'BEGIN {$f=<>} say $ARGV if /^\Q$f/m
' compromised_header.txt {} +

Question 2

您需要使用比更强大的东西grep，它只能进行单行匹配。

perl可以进行多行匹配，非常适合此类工作，并结合find生成要搜索的文件列表。

find dir/ -type f -iname '*.txt' -exec perl -e '
    local $/;    # slurp in entire files, instead of one line at a time

    my $firstfile = shift @ARGV;         # get name of the first file
    open(F,"<",$firstfile) or die "Error opening $firstfile: $!";
    my $first = <F>;                     # read it in
    close(F);
    my $search = qr/\Q$first\E/;         # compile to a fixed-string RE

    # now read in remaining files and see if they match
    while(<>) {
      next if ($ARGV eq $firstfile);
      if (m/$search/m) {
        print $ARGV,"\n";
      };
    }' ./compromised_header.txt {} +

dir/这将打印包含第一个文件（“compromished_header.txt”）中的确切文本的任何 *.txt 文件的文件名。

笔记：

运算qr//符编译正则表达式。其主要用途是在循环中使用 RE 之前对其进行预编译，这样就不会在每次循环时重新编译，从而浪费时间和 cpu 周期。
操作中使用的and以 RE 模式标记文本的开头和结尾，该\Q模式旨在解释为固定字符串 - 即字符串中可能存在的所有元字符都将被引用以禁用其特殊含义。请参阅并搜索“引用元字符”并了解详细信息。\Eqr//man perlreperldoc -f quotemeta

如果这看起来像一个丑陋、复杂、难以阅读的单行代码，那么尝试像这样作为一个独立的脚本：

#!/usr/bin/perl

local $/;    # slurp in entire files, instead of one line at a time

my $firstfile = shift @ARGV;         # get name of the first file
open(F,"<",$firstfile) or die "Error opening $firstfile: $!";
my $first = <F>;                     # read it in
close(F);
my $search = qr/\Q$first\E/;         # compile to a fixed-string RE

# now read in remaining files and see if they match
while(<>) {
  next if ($ARGV eq $firstfile);
  if (m/$search/m) {
    print $ARGV,"\n";
  };
}

将其另存为，例如，check.pl并使其可执行chmod +x check.pl。然后运行：

find dir/ -type f -iname '*.txt' \
  -exec ./check.pl ./compromised_header.txt {} +

Answer

您需要使用比更强大的东西grep，它只能进行单行匹配。

perl可以进行多行匹配，非常适合此类工作，并结合find生成要搜索的文件列表。

find dir/ -type f -iname '*.txt' -exec perl -e '
    local $/;    # slurp in entire files, instead of one line at a time

    my $firstfile = shift @ARGV;         # get name of the first file
    open(F,"<",$firstfile) or die "Error opening $firstfile: $!";
    my $first = <F>;                     # read it in
    close(F);
    my $search = qr/\Q$first\E/;         # compile to a fixed-string RE

    # now read in remaining files and see if they match
    while(<>) {
      next if ($ARGV eq $firstfile);
      if (m/$search/m) {
        print $ARGV,"\n";
      };
    }' ./compromised_header.txt {} +

dir/这将打印包含第一个文件（“compromished_header.txt”）中的确切文本的任何 *.txt 文件的文件名。

笔记：

运算qr//符编译正则表达式。其主要用途是在循环中使用 RE 之前对其进行预编译，这样就不会在每次循环时重新编译，从而浪费时间和 cpu 周期。
操作中使用的and以 RE 模式标记文本的开头和结尾，该\Q模式旨在解释为固定字符串 - 即字符串中可能存在的所有元字符都将被引用以禁用其特殊含义。请参阅并搜索“引用元字符”并了解详细信息。\Eqr//man perlreperldoc -f quotemeta

如果这看起来像一个丑陋、复杂、难以阅读的单行代码，那么尝试像这样作为一个独立的脚本：

#!/usr/bin/perl

local $/;    # slurp in entire files, instead of one line at a time

my $firstfile = shift @ARGV;         # get name of the first file
open(F,"<",$firstfile) or die "Error opening $firstfile: $!";
my $first = <F>;                     # read it in
close(F);
my $search = qr/\Q$first\E/;         # compile to a fixed-string RE

# now read in remaining files and see if they match
while(<>) {
  next if ($ARGV eq $firstfile);
  if (m/$search/m) {
    print $ARGV,"\n";
  };
}

将其另存为，例如，check.pl并使其可执行chmod +x check.pl。然后运行：

find dir/ -type f -iname '*.txt' \
  -exec ./check.pl ./compromised_header.txt {} +

Question 3

如果您有带有 PCRE -P 模式的 GNU grep，那么您可以在 slurp 模式 -z 下操作，并递归地 -r list -l 与正则表达式 $re 匹配的文件。正则表达式是根据参考头文件构建的，并转义 Perl 正则表达式上下文中的所有特殊字符。

re=$(< compromised_header.txt perl -lpe '$_=quotemeta')
re=${re//[${IFS#??}]/\\n}
grep -lrzP "(?m)^$re" .

Answer

如果您有带有 PCRE -P 模式的 GNU grep，那么您可以在 slurp 模式 -z 下操作，并递归地 -r list -l 与正则表达式 $re 匹配的文件。正则表达式是根据参考头文件构建的，并转义 Perl 正则表达式上下文中的所有特殊字符。

re=$(< compromised_header.txt perl -lpe '$_=quotemeta')
re=${re//[${IFS#??}]/\\n}
grep -lrzP "(?m)^$re" .

Question 4

假设您的搜索字符串没有多个尾随换行符或 ASCII NUL 字符（请参阅将文件读入 shell 变量的陷阱了解详细信息）并且您可以使用ripgrep:

rg -lUF "$(< compromised_header.txt)" dir/

-F使用选项以便按字面搜索文件内容，而不是将其视为正则表达式

-U选项启用多行搜索

rg默认情况下将递归搜索，但默认情况下它也会进行智能过滤（尊重.gitignore规则，忽略隐藏文件/文件夹，忽略二进制文件等）。使用-uuu使其表现得像grep -r.

请参阅我的博客文章使用 cli 工具进行多行固定字符串搜索和替换对于更多这样的多行操作。

Answer