简单的 shell 脚本无法遍历数千个文件;开始正常,但一段时间后抛出“在寻找匹配的`”`时出现意外的EOF

简单的 shell 脚本无法遍历数千个文件;开始正常,但一段时间后抛出“在寻找匹配的`”`时出现意外的EOF

有问题的 Shell 脚本

让我解释一下我正在尝试做的事情,以便您可以更好地理解。假设我的目录中有 100 个 .torrent 文件。如果添加到 BitTorrent 客户端,其中 2 个将分别下载 xxx.epub 和 yyy.epub,但我不知道这 100 个中是哪 2 个。

所以我的脚本所做的是,(1) 用于find遍历所有 .torrent 文件pwd并传递每个 .torrent 文件,该文件将transmission-show解析 .torrent 文件并以人类可读格式输出元数据。然后,我们将使用它awk来获取 torrent 文件将下载的文件名,并针对 list.txt 运行该文件名,其中包含我们正在查找的文件名,即 xxx.epub 和 yyy.epub。

文件: findtor-array.sh

#! /bin/bash
#
# Search .torrent file based on 'Name' field.
#
# USAGE:
# cd ~/myspace # location of .torrent files
# Run `findtor ~/list.txt` (if `findtor.sh` is placed in `~/bin` or `~/.local/bin`)

# Turn the list of file names from ~/list.txt (or any file passed as argument) into an array
readarray -t FILE_NAMES_TO_SEARCH < "$1"

# For each file name from the list...
for FILE_NAME in "${FILE_NAMES_TO_SEARCH[@]}"
do
    # In `pwd` and 1 directory-level under, look for .torrent files and search them for the file name
    find . -maxdepth 2 -name '*.torrent' -type f -exec bash -c "transmission-show \"\$1\" | awk '/^Name\: / || /^File\: /' | awk -F ': ' '\$2 ~ \"$FILE_NAME\" {getline; print}'" _ {} \; >> ~/torrents.txt

    # The `transmission-show` command included in `find`, on it own, for clarity:
    # transmission-show xxx.torrent | awk '/^Name: / || /^File: /' | awk -F ': ' '$2 ~ "SEARCH STRING" {getline; print}'
done

我认为这个过程很简单,而且我做得很正确(除了没有检查,我知道)。但不知怎的,整个任务对于脚本来说似乎太多了,因为运行它之后,一段时间后它开始连续抛出这些错误,直到我Ctrl+C它:

_: -c: line 0: unexpected EOF while looking for matching `"'
_: -c: line 1: syntax error: unexpected end of file

这些是“规模化”问题吗?我缺少什么以及我可以做什么来解决它?

答案1

FILE_NAME直接传递到您的命令选项bash -c中。如果包含引号/shell 代码,这会导致问题。实际上,-execfindFILE_NAME可以执行任意代码。示例:在这种特殊情况下,输入文件可能包含一行'; echo "run commands";'

相反,将循环 varbash -c作为位置参数传递。例如:

find . -maxdepth 2 -name '*.torrent' -type f -exec sh -c '
transmission-show "$2" |
awk -v search="$1" '\''/^Name: / {name = substr($0,7)} /^File: / && name ~ search {print; exit}'\' \
_ "$FILE_NAME" {} \;

此外,循环遍历每个文件的所有搜索词似乎效率很低。考虑循环文件并使用以下命令进行搜索grep -f file

find . -maxdepth 2 -name '*.torrent' -type f -exec sh -c '
file=$1
shift
if transmission-show "$file" | head -n 1 | cut -d" " -f2- | grep -q "$@"; then
    printf "%s\n" "$file"
fi' _ {} "$@" \;

或没有find

for file in *.torrent */*.torrent; do
    if transmission-show "$file" | head -n 1 | cut -d' ' -f2- | grep -q "$@"; then
        printf '%s\n' "$file"
    fi
done
  • 上面只是将所有参数传递给grep,因此用法是findtor -f ~/list.txt从列表中获取模式,-F对于固定字符串-e expression等。

答案2

基于 @Kusalananda 的建议、答案(@guest 和 @Jetchisel),以及凯文的详细回答,我想出了这个:

#! /bin/bash
#
# Search for 'Name' field match in torrent metadata for all .torrent files in
# current directory and directories 1-level below.
#
# USAGE e.g.:
# cd ~/torrent-files # location of .torrent files
# Run `~/findtor.sh ~/list.txt`

# Get one file name at a time ($FILE_NAME_TO_SEARCH) to search for from list.txt
# provided as argument to this script.
while IFS= read -r FILE_NAME_TO_SEARCH; do

    # `find` .torrent files in current directory and directories 1-level under
    # it. `-print0` to print the full file name on the standard output, followed
    # by a null character (instead of the newline character that `-print` uses).
    #
    # While that's happening, we'll again use read, this time to pass one
    # .torrent file at a time (from output of `find`) to `transmission-show`
    # for the latter to output the metadata of the torrent file, followed by
    # `awk` commands to look for the file name match ($FILE_NAME_TO_SEARCH) from
    # list.txt.
    find . -maxdepth 2 -name '*.torrent' -type f -print0 |
        while IFS= read -r -d '' TORRENT_NAME; do
            transmission-show "$TORRENT_NAME" | awk '/^Name: / || /^File: /' | awk -F ': ' -v search_string="$FILE_NAME_TO_SEARCH" '$2 ~ search_string {getline; print}';
        done >> ~/torrents-found.txt

done < "$1"

我刚刚运行了这个,到目前为止它似乎工作得很好。非常感谢所有参与者!

虽然我已尽力而为,但欢迎任何修复和进一步的建议。

答案3

我会这样写。

#!/usr/bin/env bash

pattern_file="$1"

while IFS= read -r -d '' file; do
    transmission-show "$file" | awk .... "$pattern_file"   ##: Figure out how to do the awk with a file rather than looping through an array.
done < <(find . -maxdepth 2 -name '*.torrent' -type f -print0)

这应该可以避免引用地狱:-)

好吧,也许nullglob不需要。

编辑

尝试 find 命令并将其用于原始脚本。

find . -maxdepth 2 -name '*.torrent' -type f -exec bash -c 'transmission-show "$1" | awk "/^Name\: / || /^File\: /" | awk -F ": " "\$2 ~ \"$FILE_NAME\" {getline; print}"' _ {} + >> ~/torrents.txt

相关内容