如何在 bash 中运行多行 awk 脚本

Question 1

将代码放在函数中，而不是变量中，如下所示（未经测试，仍有改进空间）：

set -x
set -e
do_awk() {
    awk '
        ($1 !~ /delete/) &&                 # ensure we are not trying to process deleted files
        ($4 !~ /theme.puml|config.puml/) && # do not try to process our theme or custom config
        ($4 ~ /.puml/) {                    # only process puml files
            printf "%s ", $4                # only print the file name and strip newlines for spaces
        }
        END { print "" }                    # ensure we do print a newline at the end
    ' "${@:--}"
}
GIT_OUTPUT=$(git diff-tree -r --no-commit-id --summary "$GITHUB_SHA")
AWK_OUPUT=$(printf '%s\n' "$GIT_OUTPUT" | do_awk)
echo "::set-output name=files::$GIT_OUTPUT"
set +e
set +x

Answer

将代码放在函数中，而不是变量中，如下所示（未经测试，仍有改进空间）：

set -x
set -e
do_awk() {
    awk '
        ($1 !~ /delete/) &&                 # ensure we are not trying to process deleted files
        ($4 !~ /theme.puml|config.puml/) && # do not try to process our theme or custom config
        ($4 ~ /.puml/) {                    # only process puml files
            printf "%s ", $4                # only print the file name and strip newlines for spaces
        }
        END { print "" }                    # ensure we do print a newline at the end
    ' "${@:--}"
}
GIT_OUTPUT=$(git diff-tree -r --no-commit-id --summary "$GITHUB_SHA")
AWK_OUPUT=$(printf '%s\n' "$GIT_OUTPUT" | do_awk)
echo "::set-output name=files::$GIT_OUTPUT"
set +e
set +x

Question 2

您的主要问题是代码没有被引用，这使得 shell 替换了代码中的awk内容。$4为了保护代码免受 shell 的影响，请确保引用此处文档。您可以通过将起始定界词括在引号中（如<<'AWK'or 中<<"AWK"）或将其转义为来获得带引号的此处文档<<\AWK。

这是按照我编写的方式重写您的脚本：

git diff-tree -r --no-commit-id --summary "$GITHUB_SHA" |
awk '
    $1 !~ /^delete/ && $4 !~ /(theme|config)\.puml$/ && $4 ~ /\.puml$/ {
        name[++n] = $4
    }
    END {
        $0 = ""
        for (i in name) $i = name[i]
        printf "::set-output name=files::%s\n", $0
    }'

请注意，我没有将中间数据存储在变量中。这样做效率低下（你可能不知道多少需要存储在变量中的数据）并且容易出现引用错误，而是在空格上吐出值并调用文件名通配。在这方面，使用$GIT_OUTPUTand不加引号是有问题的，并且特别麻烦，因为如果数据包含反斜杠，则可能会修改数据，具体取决于 shell 的配置。$AWKecho $GIT_OUTPUTecho

关于引用：参见什么时候需要双引号？

我在脚本中使用标准pattern { action }语法来构建一个数组，name其中包含要解析的字符串。在该END块中，我创建一个输出记录，$0并使用您用于输出的前缀进行输出echo。

你也可以这样做，这给你留下了更多的评论空间：

git diff-tree -r --no-commit-id --summary "$GITHUB_SHA" |
awk '
    $1 ~ /^delete/ {
        # skip these
        next
    }
    $4 ~ /(theme|config)\.puml$/ {
        # and these...
        next
    }
    $4 ~ /\.puml$/ {
        # pick out filename (we assume no whitespace in filenames)
        name[++n] = $4
    }
    END {
        $0 = ""
        for (i in name) $i = name[i]
        printf "::set-output name=files::%s\n", $0
    }'

如果您想坚持将awk源代码放在此处文档中，我会这样做：

awk_script=$(mktemp) || exit 1
trap 'rm -f "$awk_script"' EXIT

cat <<'AWK_CODE' >"$awk_script"
$1 !~ /^delete/ && $4 !~ /(theme|config)\.puml$/ && $4 ~ /\.puml$/ {
    name[++n] = $4
}
END {
    $0 = ""
    for (i in name) $i = name[i]
    printf "::set-output name=files::%s\n", $0
}
AWK_CODE

git diff-tree -r --no-commit-id --summary "$GITHUB_SHA" |
awk -f "$awk_script"

即，将awk脚本保存到稍后使用调用的临时文件awk -f，并在脚本末尾删除（此处使用trap）。但对于如此短的awk程序，与在单引号字符串中使用脚本（如首先所示）相比，我认为这样做没有任何额外的好处。它很混乱并且包含很多额外的命令除了需要执行的两个中央命令之外，仅用于维护。

Answer