使用 Bash 迭代嵌套目录并从 YAML 文件中提取某些字段

Question 1

注意：这个答案的长度是因为至少有两个主要的实用程序变体称为yq，用于解析 YAML 数据，它们的功能和表达式语法略有不同，我涵盖了这两个变体。我还考虑简单地使用文件名通配来查找所有文件和使用find（当输入文件太多时）。最后，我回答评论中提出的其他问题。

不要迭代的输出find。相反，请find使用调用您的实用程序-exec。我在这个答案的下面有一个例子。您还缺少对某些扩展的引用。

也可以看看：

在命令行上给定一个或多个 YAML 文件，以下yq命令将创建 YAML 数据摘要文件：

yq -y -s '{ persons: map({ name: .name, age: .age }) }' files

该命令将所有输入读取到一个大数组中（感谢-s, 或--slurp），然后将其传递给该map()命令。该map()命令提取数组中每个元素的name和字段，并将它们作为对象添加到数组中。agepersons

这使用 Andrey Kislyuk 的基于 Pythonyq的https://kislyuk.github.io/yq/，多功能 JSON 解析器的包装器jq。如果-y从命令中删除该选项，您将获得 JSON 输出。

使用 Mike Farah 的基于 Go 的yq替代方案：

yq -N '[{ "name": .name, "age": .age }]' files | yq '{ "persons": . }'

在bashshell 中，您可以将其应用于example.yaml当前目录或其下任何位置的所有文件，从而output.yaml在当前目录中创建输出文件，如下所示：

shopt -s globstar failglob

yq -y -s '{ persons: map({ name: .name, age: .age }) }' ./**/example.yaml >output.yaml

或者，用迈克法拉的yq：

shopt -s globstar failglob

yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml | yq '{ "persons": . }' >output.yaml

这假设文件少于几千个example.yaml，或者命令行将扩展为太长的命令。

首先启用shellglobstar选项以允许我们使用**文件名通配模式，该模式在路径名中进行匹配/。failglob如果没有匹配的文件名，我们还启用shell 选项以使整个命令正常失败。

测试：

$ tree
.
├── dir1
│   └── example.yaml
├── example.yaml
└── script-andrey
└── script-mike

1 directory, 4 files

$ cat script-andrey
shopt -s globstar failglob
yq -y -s '{ persons: map({ name: .name, age: .age }) }' ./**/example.yaml >output.yaml

$ bash script-andrey
$ cat output.yaml
persons:
  - name: Joao
    age: 18
  - name: Andre
    age: 13

yq还测试麦克风：

$ cat script-mike
shopt -s globstar failglob
yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml | yq '{ "persons": . }' >output.yaml

$ bash script-mike
$ cat output.yaml
persons:
  - name: Joao
    age: 18
  - name: Andre
    age: 13

如果您有成千上万个这样的 YAML 输入文件，那么您可能需要yq更智能地应用，使用find.

这是使用安德烈的yq：

find . -name example.yaml -type f \
    -exec yq -y -s 'map({ name: .name, age: .age })' {} + |
yq -y '{ persons: . }' >output.yaml

这会找到名称为的所有常规文件example.yaml。这些数据会分批传递，并从中yq提取name和age字段，从而创建一个数组。然后是最后一个yq命令，用于收集生成的 YAML 数组并将其作为persons最终输出中的键值。

同样，迈克的yq：

find . -name example.yaml -type f \
    -exec yq -N '[{ "name": .name, "age": .age }]' {} + |
yq '{ "persons": . }' >output.yaml

使用与上面相同的文件集进行测试：

$ rm output.yaml
$ find . -name example.yaml -type f -exec yq -y -s 'map({ name: .name, age: .age })' {} + | yq -y '{ persons: . }' >output.yaml

$ cat output.yaml
persons:
  - name: Andre
    age: 13
  - name: Joao
    age: 18

（运行为 Mike 设计的命令yq会生成相同的输出。）

find请注意，输出的顺序取决于查找文件的顺序。

您是否想要对例如字段上的输出文件进行排序name，那么以下内容将就地对文件进行排序（请注意，我不知道如何使用 Mike Farah 的基于 Go 来执行此操作yq）：

yq -i -y '.persons |= sort_by(.name)' output.yaml

要按相反顺序（就地）排序：

yq -i -y '.persons |= (sort_by(.name) | reverse)' output.yaml

在评论中，用户询问是否可以将数据附加到现有文件中。这个有可能。

下面的命令假设最后一个output.yaml是数组的末尾persons（这样该命令就可以向其中添加新的数组条目）。

使用安德烈的yq：

shopt -s globstar failglob
yq -y -s 'map({ name: .name, age: .age })' ./**/example.yaml >>output.yaml

或者，与find,

find . -name example.yaml -type f \
    -exec yq -y -s 'map({ name: .name, age: .age })' {} + >>output.yaml

使用迈克的yq：

shopt -s globstar failglob
yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml >>output.yaml

或者，使用find：

find . -name example.yaml -type f \
    -exec yq -N '[{ "name": .name, "age": .age }]' {} + >>output.yaml

Answer

注意：这个答案的长度是因为至少有两个主要的实用程序变体称为yq，用于解析 YAML 数据，它们的功能和表达式语法略有不同，我涵盖了这两个变体。我还考虑简单地使用文件名通配来查找所有文件和使用find（当输入文件太多时）。最后，我回答评论中提出的其他问题。

不要迭代的输出find。相反，请find使用调用您的实用程序-exec。我在这个答案的下面有一个例子。您还缺少对某些扩展的引用。

也可以看看：

在命令行上给定一个或多个 YAML 文件，以下yq命令将创建 YAML 数据摘要文件：

yq -y -s '{ persons: map({ name: .name, age: .age }) }' files

该命令将所有输入读取到一个大数组中（感谢-s, 或--slurp），然后将其传递给该map()命令。该map()命令提取数组中每个元素的name和字段，并将它们作为对象添加到数组中。agepersons

这使用 Andrey Kislyuk 的基于 Pythonyq的https://kislyuk.github.io/yq/，多功能 JSON 解析器的包装器jq。如果-y从命令中删除该选项，您将获得 JSON 输出。

使用 Mike Farah 的基于 Go 的yq替代方案：

yq -N '[{ "name": .name, "age": .age }]' files | yq '{ "persons": . }'

在bashshell 中，您可以将其应用于example.yaml当前目录或其下任何位置的所有文件，从而output.yaml在当前目录中创建输出文件，如下所示：

shopt -s globstar failglob

yq -y -s '{ persons: map({ name: .name, age: .age }) }' ./**/example.yaml >output.yaml

或者，用迈克法拉的yq：

shopt -s globstar failglob

yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml | yq '{ "persons": . }' >output.yaml

这假设文件少于几千个example.yaml，或者命令行将扩展为太长的命令。

首先启用shellglobstar选项以允许我们使用**文件名通配模式，该模式在路径名中进行匹配/。failglob如果没有匹配的文件名，我们还启用shell 选项以使整个命令正常失败。

测试：

$ tree
.
├── dir1
│   └── example.yaml
├── example.yaml
└── script-andrey
└── script-mike

1 directory, 4 files

$ cat script-andrey
shopt -s globstar failglob
yq -y -s '{ persons: map({ name: .name, age: .age }) }' ./**/example.yaml >output.yaml

$ bash script-andrey
$ cat output.yaml
persons:
  - name: Joao
    age: 18
  - name: Andre
    age: 13

yq还测试麦克风：

$ cat script-mike
shopt -s globstar failglob
yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml | yq '{ "persons": . }' >output.yaml

$ bash script-mike
$ cat output.yaml
persons:
  - name: Joao
    age: 18
  - name: Andre
    age: 13

如果您有成千上万个这样的 YAML 输入文件，那么您可能需要yq更智能地应用，使用find.

这是使用安德烈的yq：

find . -name example.yaml -type f \
    -exec yq -y -s 'map({ name: .name, age: .age })' {} + |
yq -y '{ persons: . }' >output.yaml

这会找到名称为的所有常规文件example.yaml。这些数据会分批传递，并从中yq提取name和age字段，从而创建一个数组。然后是最后一个yq命令，用于收集生成的 YAML 数组并将其作为persons最终输出中的键值。

同样，迈克的yq：

find . -name example.yaml -type f \
    -exec yq -N '[{ "name": .name, "age": .age }]' {} + |
yq '{ "persons": . }' >output.yaml

使用与上面相同的文件集进行测试：

$ rm output.yaml
$ find . -name example.yaml -type f -exec yq -y -s 'map({ name: .name, age: .age })' {} + | yq -y '{ persons: . }' >output.yaml

$ cat output.yaml
persons:
  - name: Andre
    age: 13
  - name: Joao
    age: 18

（运行为 Mike 设计的命令yq会生成相同的输出。）

find请注意，输出的顺序取决于查找文件的顺序。

您是否想要对例如字段上的输出文件进行排序name，那么以下内容将就地对文件进行排序（请注意，我不知道如何使用 Mike Farah 的基于 Go 来执行此操作yq）：

yq -i -y '.persons |= sort_by(.name)' output.yaml

要按相反顺序（就地）排序：

yq -i -y '.persons |= (sort_by(.name) | reverse)' output.yaml

在评论中，用户询问是否可以将数据附加到现有文件中。这个有可能。

下面的命令假设最后一个output.yaml是数组的末尾persons（这样该命令就可以向其中添加新的数组条目）。

使用安德烈的yq：

shopt -s globstar failglob
yq -y -s 'map({ name: .name, age: .age })' ./**/example.yaml >>output.yaml

或者，与find,

find . -name example.yaml -type f \
    -exec yq -y -s 'map({ name: .name, age: .age })' {} + >>output.yaml

使用迈克的yq：

shopt -s globstar failglob
yq -N '[{ "name": .name, "age": .age }]' ./**/example.yaml >>output.yaml

或者，使用find：

find . -name example.yaml -type f \
    -exec yq -N '[{ "name": .name, "age": .age }]' {} + >>output.yaml

Question 2

有很多方法可以做到这一点，但最简单的可能是find命令。

首先，我们使用新的数组结构创建输出文件：

echo "persons:" > newfile.yaml

接下来，我们要识别每一个文件与目标目录中的文件名匹配example.yaml（我们称之为/home/user/yaml-files）。这是 find 的基本用例，并且相当容易理解：

find /home/user/yaml-files -type f -name example.yaml

find有一个强大的内置功能，可以在找到匹配项时使用-exec和-execdir选项执行 shell 命令。-exec在运行时的同一工作目录中执行find，而 while-execdir是一个更安全的选项，因为 shell 命令运行“里面”找到匹配项的目录。为了简单起见，我们将使用-exec.

我们需要example.yaml在这些文件中搜索我们想要的行，重新格式化并将结果附加到我们的输出文件中：

find /home/user/yaml-files -type f -name example.yaml -exec awk '$1 ~ /^name:|^age:/ {gsub(/name:/,"  - name:",$1); gsub(/age:/,"    age:",$1); print $0}' {} \; | tee -a newfile.yaml

其中的命令awk搜索example.yaml以或开头的每行name:，age:前面没有空格或其他字符。gsub是一个awk内置函数，对于字符串替换很有用。这里我们有 2 个gsub过滤器，在将匹配的行打印到stdout.

通常，人们会使用重定向将输出写入文件，但find -exec这样做确实会变得有点复杂。在这种情况下，该tee命令很棒 - 它会将输出回显到控制台，也回显到文件。该-a选项tee告诉附加到文件，否则每次都会覆盖该文件，并且我们只会留下最后一次写入文件的结果。

该解决方案仅使用几个命令，据我所知，您可能遇到的每个 Linux 系统上都存在这些命令 - 没有特殊要求，并且代码非常可移植。

Answer

有很多方法可以做到这一点，但最简单的可能是find命令。

首先，我们使用新的数组结构创建输出文件：

echo "persons:" > newfile.yaml

接下来，我们要识别每一个文件与目标目录中的文件名匹配example.yaml（我们称之为/home/user/yaml-files）。这是 find 的基本用例，并且相当容易理解：

find /home/user/yaml-files -type f -name example.yaml

find有一个强大的内置功能，可以在找到匹配项时使用-exec和-execdir选项执行 shell 命令。-exec在运行时的同一工作目录中执行find，而 while-execdir是一个更安全的选项，因为 shell 命令运行“里面”找到匹配项的目录。为了简单起见，我们将使用-exec.

我们需要example.yaml在这些文件中搜索我们想要的行，重新格式化并将结果附加到我们的输出文件中：

find /home/user/yaml-files -type f -name example.yaml -exec awk '$1 ~ /^name:|^age:/ {gsub(/name:/,"  - name:",$1); gsub(/age:/,"    age:",$1); print $0}' {} \; | tee -a newfile.yaml

其中的命令awk搜索example.yaml以或开头的每行name:，age:前面没有空格或其他字符。gsub是一个awk内置函数，对于字符串替换很有用。这里我们有 2 个gsub过滤器，在将匹配的行打印到stdout.

通常，人们会使用重定向将输出写入文件，但find -exec这样做确实会变得有点复杂。在这种情况下，该tee命令很棒 - 它会将输出回显到控制台，也回显到文件。该-a选项tee告诉附加到文件，否则每次都会覆盖该文件，并且我们只会留下最后一次写入文件的结果。

该解决方案仅使用几个命令，据我所知，您可能遇到的每个 Linux 系统上都存在这些命令 - 没有特殊要求，并且代码非常可移植。

Question 3

如果您正在查找具有特定名称的文件example.yaml，则可以非常轻松地做到这一点。首先创建一个新文件persons:，然后将所有以所有文件开头name:或age:来自所有example.yaml文件的行附加到其中：

printf 'persons:\n' > personsFile
find /target/directory -name example.yaml -exec grep -E '^(name|age):' {} + >> personsFile

如果您确实需要-每个条目前面的name和缩进，您可以在第二遍中添加它：

printf 'persons:\n' > personsFile
find /target/directory -name example.yaml -exec grep -E '^(name|age):' {} + >> personsFile
sed -i 's/^name/  - name/; s/^age/    age/' personsFile

但如果您确实正在处理像 YAML 这样的结构化格式，您可能应该查看专用工具，而不是像这样破解它。

Answer

如果您正在查找具有特定名称的文件example.yaml，则可以非常轻松地做到这一点。首先创建一个新文件persons:，然后将所有以所有文件开头name:或age:来自所有example.yaml文件的行附加到其中：

printf 'persons:\n' > personsFile
find /target/directory -name example.yaml -exec grep -E '^(name|age):' {} + >> personsFile

如果您确实需要-每个条目前面的name和缩进，您可以在第二遍中添加它：

printf 'persons:\n' > personsFile
find /target/directory -name example.yaml -exec grep -E '^(name|age):' {} + >> personsFile
sed -i 's/^name/  - name/; s/^age/    age/' personsFile

但如果您确实正在处理像 YAML 这样的结构化格式，您可能应该查看专用工具，而不是像这样破解它。

Question 4

阅读man find xargs grep bash并执行以下操作：

printf "%s\n" "persons:" >newfile
find . -type f -name '*.yaml' -print0 | \
    xargs -0 -r \
        grep -E --no-filename 'name:|age:' >>newfile

注意：此代码尚未经过测试。

Answer

阅读man find xargs grep bash并执行以下操作：

printf "%s\n" "persons:" >newfile
find . -type f -name '*.yaml' -print0 | \
    xargs -0 -r \
        grep -E --no-filename 'name:|age:' >>newfile

注意：此代码尚未经过测试。

使用 Bash 迭代嵌套目录并从 YAML 文件中提取某些字段

答案1

答案2

答案3

答案4

相关内容