如何在文件列出的目录中查找文件？

Question 1

如果目录名称是每行一个，那么您可以使用readarray(bash v4+) 避免名称中包含空格、制表符或通配符的目录出现问题：

readarray -t dirs < subdirs2search.txt
find "${dirs[@]}" ...

如果某些目录名称以开头，那仍然没有帮助-，但对于 GNU 来说find，没有办法解决这个问题。

Answer

如果目录名称是每行一个，那么您可以使用readarray(bash v4+) 避免名称中包含空格、制表符或通配符的目录出现问题：

readarray -t dirs < subdirs2search.txt
find "${dirs[@]}" ...

如果某些目录名称以开头，那仍然没有帮助-，但对于 GNU 来说find，没有办法解决这个问题。

Question 2

我发现它并没有将搜索限制为仅文本文件

ack对于递归 grep 类型的东西来说通常是一个方便的工具。它做默认情况下，搜索限制为文本文件（使用基于文件名和内容的启发式确定），并且默认情况下会跳过.git/等目录.svn，如果您是开发人员，这可能是您想要的。 https://beyondgrep.com/。

大多数 GNU/Linux 发行版都包含它，因此很容易安装。它是用 perl 编写的（因此它的正则表达式是perl正则表达式，类似于 GNU 的正则表达式grep -P）。

ack -- "desired text"  $(<subdirs.txt)

应该可以做你想做的事情并且很容易输入。此外，它还可以提供良好的色彩输出以供交互式使用。

（其他答案中介绍了在命令行上进行分词的不同方法subdirs.txt。您可能只想让 shell 的标准分词来完成此操作，或者readarray仅在行上进行分词并阻止全局扩展。）

Answer

我发现它并没有将搜索限制为仅文本文件

ack对于递归 grep 类型的东西来说通常是一个方便的工具。它做默认情况下，搜索限制为文本文件（使用基于文件名和内容的启发式确定），并且默认情况下会跳过.git/等目录.svn，如果您是开发人员，这可能是您想要的。 https://beyondgrep.com/。

大多数 GNU/Linux 发行版都包含它，因此很容易安装。它是用 perl 编写的（因此它的正则表达式是perl正则表达式，类似于 GNU 的正则表达式grep -P）。

ack -- "desired text"  $(<subdirs.txt)

应该可以做你想做的事情并且很容易输入。此外，它还可以提供良好的色彩输出以供交互式使用。

（其他答案中介绍了在命令行上进行分词的不同方法subdirs.txt。您可能只想让 shell 的标准分词来完成此操作，或者readarray仅在行上进行分词并阻止全局扩展。）

Question 3

当然，发布这个问题帮助我摆脱了严格执行此操作的执念，find并让我想到通过 Bash 扩展文件。我发布一个答案希望它能对其他人有所帮助（并记录下来以供我自己将来使用）。

让 Bash 扩展文件内容的咒语是$(<subdirs2search.txt)。因此，如果 subdirs2search.txt 包含：

SubDir1 SubDir2 SubDir4

像下面这样的命令将完成所需的搜索：

find $(<subdirs2search.txt) -type f -name="*.txt" -exec grep -H "desired text" {} \;

Answer

当然，发布这个问题帮助我摆脱了严格执行此操作的执念，find并让我想到通过 Bash 扩展文件。我发布一个答案希望它能对其他人有所帮助（并记录下来以供我自己将来使用）。

让 Bash 扩展文件内容的咒语是$(<subdirs2search.txt)。因此，如果 subdirs2search.txt 包含：

SubDir1 SubDir2 SubDir4

像下面这样的命令将完成所需的搜索：

find $(<subdirs2search.txt) -type f -name="*.txt" -exec grep -H "desired text" {} \;

Question 4

#!/usr/bin/perl -w

use strict;
use File::Find ();

sub wanted;
sub process_file ($@);

my $dirfile = shift;    # First argument is the filename containing the list
                        # of directories.

my $pattern = shift;    # Second arg is a perl RE containing the pattern to search
                        # for. Remember to single-quote it on the command line.

# Read in the @dirs array from $dirfile
#
# A NUL-separated file is best, just in case any of the directory names
# contained line-feeds.  If you're certain that could never happen, a
# plain-text LF-separated file would do.
#
# BTW, you can easily generate a NUL-separated file from the shell with:
#    printf "%s\0" dir1 dir2 dir3 dir4 $'dir\nwith\n3\nLFs' > dirs.txt

my @dirs=();

{
  local $/="\0";    # delete this line if you want to use a LF-separated file.
                    # In that case, the { ... } block around the code from open to
                    # close is no longer needed.  It's only there so it's possible
                    # to make a local change to the $/ aka $INPUT_RECORD_SEPARATOR
                    # variable.

  open(DIRFILE,"<",$dirfile);
  while(<DIRFILE>) {
    chomp;
    push @dirs, $_;
  };
  close(DIRFILE);
};

File::Find::find({wanted => \&wanted}, @dirs);
exit;

sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && -f _ && process_file($_);
}

sub process_file ($@) {

    # This function currently just greps for pattern in the filename passed to
    # it. As the function name implies, it could be used to process the file
    # in any way, not just grep it.

    my $filename = shift;

    # uncomment the return statement below to skip "binary" files.
    # (note this is a workable but fairly crude test.  Perl's File::MMagic
    # module can be used to more accurately identify file types, using the
    # same "magic" file databases as the /usr/bin/file command)

    # return if -B $filename;

    open(FILE,"<",$filename);
    while(<FILE>) {
      print "$filename:$_" if (m/$pattern/o) ;
    };

    close(FILE);
}

这使用perlperl 的File::Find模块来完成与您的find ... -exec grep.

这个脚本没有什么特别有趣或特别的地方除了该process_file函数可以很容易地修改为您想要对文件执行任何操作 - 例如更改所有者或权限、删除它、重命名它、插入或删除行或您可能想要的任何其他操作。

例如，如果您想删除包含与模式匹配的文本的文件，您可以将 process_file 函数替换为如下所示：

sub process_file ($@) {

    my $filename = shift;
    my $found = 0;

    # uncomment to skip "binary" files:
    return if -B $filename;

    open(FILE,"<",$filename);
    while(<FILE>) {
      if (m/$pattern/o) {
        $found = 1;
        last;
      };
    };

    close(FILE);
    unlink $filename if ($found);
}

还值得一提的是，wanted该脚本中的函数当前仅查找常规文件（-f测试）。 Perlstat和lstat函数提供对可用于匹配文件的所有文件元数据find（uid、gid、perms、大小、atime、mtime 等）的访问，因此该wanted函数可以复制任何和所有查找谓词。请参阅perldoc -f stat和perldoc -f lstat了解详细信息。

顺便说一句，该脚本最初是由生成的find2perl，然后进行了大幅修改，a) 从文件中读取目录列表，b) 在 perl 代码中执行 grep 而不是通过 fork 进行 grep，grepc) 添加大量注释。性能应该几乎相同，find ... -exec grep因为 grep 无法比 perl 更快地打开文件或进行正则表达式模式匹配。它甚至可能更快。

另外顺便说一句，find2perl它曾经包含在 perl 中，但从 perl 5.22 开始它被删除了，现在可以在 CPAN 上找到查找2perl

Answer

#!/usr/bin/perl -w

use strict;
use File::Find ();

sub wanted;
sub process_file ($@);

my $dirfile = shift;    # First argument is the filename containing the list
                        # of directories.

my $pattern = shift;    # Second arg is a perl RE containing the pattern to search
                        # for. Remember to single-quote it on the command line.

# Read in the @dirs array from $dirfile
#
# A NUL-separated file is best, just in case any of the directory names
# contained line-feeds.  If you're certain that could never happen, a
# plain-text LF-separated file would do.
#
# BTW, you can easily generate a NUL-separated file from the shell with:
#    printf "%s\0" dir1 dir2 dir3 dir4 $'dir\nwith\n3\nLFs' > dirs.txt

my @dirs=();

{
  local $/="\0";    # delete this line if you want to use a LF-separated file.
                    # In that case, the { ... } block around the code from open to
                    # close is no longer needed.  It's only there so it's possible
                    # to make a local change to the $/ aka $INPUT_RECORD_SEPARATOR
                    # variable.

  open(DIRFILE,"<",$dirfile);
  while(<DIRFILE>) {
    chomp;
    push @dirs, $_;
  };
  close(DIRFILE);
};

File::Find::find({wanted => \&wanted}, @dirs);
exit;

sub wanted {
    my ($dev,$ino,$mode,$nlink,$uid,$gid);

    (($dev,$ino,$mode,$nlink,$uid,$gid) = lstat($_)) && -f _ && process_file($_);
}

sub process_file ($@) {

    # This function currently just greps for pattern in the filename passed to
    # it. As the function name implies, it could be used to process the file
    # in any way, not just grep it.

    my $filename = shift;

    # uncomment the return statement below to skip "binary" files.
    # (note this is a workable but fairly crude test.  Perl's File::MMagic
    # module can be used to more accurately identify file types, using the
    # same "magic" file databases as the /usr/bin/file command)

    # return if -B $filename;

    open(FILE,"<",$filename);
    while(<FILE>) {
      print "$filename:$_" if (m/$pattern/o) ;
    };

    close(FILE);
}