获取从包含数组和排除包含 glob 的数组构建的文件数组

2024-6-20 • tag-icon

bash shell-script find wildcards array

获取从包含数组和排除包含 glob 的数组构建的文件数组

我想做以下事情：

定义一个 glob 数组，指定要包含在进程中的文件的基本集合。
定义一个 glob 数组，指定要从该进程中排除的文件。如果这个 glob 数组指定的文件甚至不在上述集合中，对我来说并不重要。
构建一个数组文件（不是 glob），它获取包含 glob 数组指定的所有文件，并删除属于排除 glob 数组的任何文件。

我一直在为此苦苦挣扎。只是为了展示一些明确的进展示例和我所尝试的内容，我尝试过类似的操作：

# List all files to potentially include in the process
files_to_include=(
    'utils/*.txt'
)

# List any files here that should be excluded from the above list
files_to_exclude=(
    '*dont-use.txt'
    'utils/README.md'
)

# Empty array of files
files=()

for file in ${files_to_exclude[@]}; do
    temp=find $files_to_include -type f \( -name '*.txt' -and -not -name $file \)
    files+=$temp
done

# I want this to be the total collection of files that I care about
echo ${files[@]}

显然，这个 for 循环逻辑不起作用，但它至少让我开始了，但我仍在努力寻找适当的方法来做到这一点。（仅当尝试将输出分配给我不知道为什么会发生时，我也会收到奇怪的权限被拒绝消息find。temp）

我喜欢，find因为据我了解，它的性能会比grep.这是一个实际问题，因为我的实际用例中有很多文件。可能有几种不同的方法可以做到这一点，但我希望我的脚本中尽可能少的“魔法”。因此，请帮助使脚本变得高效且易于理解。

据我所知，我需要一个过程来扩展包含数组中的所有 glob，扩展排除数组中的所有 glob，然后从包含数组中减去排除项。不过，这是一个很高的水平，实施它对我来说是一个挑战。

谢谢你！

答案1

看起来你想files_to_include成为 glob，而files_to_exclude应该只是模式，否则 glob*dont-use.txt就不会产生(文件名生成或者路径名扩展是通配符的其他名称） a utils/whatev-dont-use.txtso 不会排除该文件，如果utils/*.txt只是一个模式，它也会匹配utils/.git/foo/bar/.txt例如。

zsh有一个~ 按模式排除glob 运算符，所以你可以这样做

set -o extendedglob
globs_to_include=(
  'utils/*.txt'
)

patterns_to_exclude=(
  '*dont-use.txt'
  'utils/README.md'
)

typeset -U files=(
  $~^globs_to_include~(${(j[|])~patterns_to_exclude})(ND.)
)

或者不需要，使用参数扩展运算extendedglob符进行事后过滤：patterns_to_exclude${array:#pattern}

typeset -U files=( $~^globs_to_include(N.) )
files=( ${files:#(${(j[|])~patterns_to_exclude})} )

如果两个数组都是模式，并且您希望将它们与当前工作目录中或下面的每个常规文件的路径进行匹配，那么可能是：

() {
  files=( ${${(M)@:#(${(j[|])~patterns_to_include})}:#(${(j[|])~patterns_to_exclude})} )
} **/*(ND.)

或者分步骤使其更清晰：

pattern_to_include="(${(j[|])patterns_to_include})"
pattern_to_exclude="(${(j[|])patterns_to_exclude})"
files=( **/*(ND.) )
files=( ${(M)files:#$~pattern_to_include} )
files=( ${files:#$~pattern_to_exclude} )

如果它们都应该是全局的，那就是：

typeset -U files_to_include=(
  utils/*.txt(ND.)
)
typeset -U files_to_exclude=(
  *don-use.txt(ND.)
  utils/README.md(ND.)
)
files=( ${files_to_include:|files_to_exclude} )

使用${A:|B}数组减法运算符。

其中一些 zsh 特定语法的解释：

array=( elements ): 数组声明，自从 bash 最终在 2.0 中添加了数组支持以来，被一些 shell 复制了。与set -A array -- elementsKorn shell类似。
**/：用于递归通配的任何级别的目录。
extendedglob~选项：操作员需要
typeset -U array：使数组元素唯一
$~var: 使内容$var被视为模式
$^array/more：使扩展成为element1/more element2/morecsh风格的{element1,element2}/more时尚
${(...)param}这些是参数扩展标志。j[|]使用来j添加数组的元素|。
(ND.)：这些是 glob 限定符，N为该 glob 启用 nullglob，D点 globdot，.以限制类型的文件常规的。
${array:#pattern}过滤掉与模式匹配的元素。有了(M)标志，就变成了过滤入。
() { body; } args：匿名函数传递一些参数（在$@又名$argv和$1, $2... 中可用，与常规命名函数一样）。

答案2

让引用对你有利而不是对你不利。不要引用 glob，而是让 shell 尝试扩展它们。对变量使用双引号以防止它们被视为通配符。请记住将包含@双引号的数组特殊符号放在一起：

includes=( utils/*.txt )
excludes=( *dont-use.txt utils/README.md )

# Convert array to hash so we can easily index it
declare -A excludes_hash
for i in "${excludes[@]}"
do
    excludes_hash["$i"]=1
done

# Build list of files
files=()
for i in "${includes[@]}"
do
    [ -z "${excludes_hash[$i]}" ] && files+=("$i")
done

# Total collection of files that I care about
printf "%s\n" "${files[@]}"

相关内容