搜索模式并创建同名文件

Question 1

您的文件由一组 JSON 对象组成。每个对象都包含一个.location_country键。我们可以从每个对象创建一个 shell 命令，将对象本身的序列化副本写入由键值命名的文件中.location_country。然后这些 shell 命令可以由 shell 执行。

使用jq，

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@json可以使用in 运算符创建序列化对象jq，该运算符将发出包含输入文档（在本例中为当前对象）的 JSON 编码字符串。然后将其输入@sh以正确引用 shell 的字符串。该@sh运算符还用于根据.location_country键的值创建部分输出文件名。

该命令本质上创建 shell 代码，该代码将调用printf、输出当前对象并将输出重定向到特定文件。

鉴于中的示例数据file.txt，这将发出以下内容：

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

您可以将其重定向到一个单独的文件并运行它来sh执行命令，或者您可以eval直接在 shell 中使用：

eval "$( jq ...as above... )"

由于我们使用正确的 JSON 解析器，jq因此即使输入 JSON 文档的格式不是每行一个对象，上面的代码也能工作。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Answer

您的文件由一组 JSON 对象组成。每个对象都包含一个.location_country键。我们可以从每个对象创建一个 shell 命令，将对象本身的序列化副本写入由键值命名的文件中.location_country。然后这些 shell 命令可以由 shell 执行。

使用jq，

jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt

@json可以使用in 运算符创建序列化对象jq，该运算符将发出包含输入文档（在本例中为当前对象）的 JSON 编码字符串。然后将其输入@sh以正确引用 shell 的字符串。该@sh运算符还用于根据.location_country键的值创建部分输出文件名。

该命令本质上创建 shell 代码，该代码将调用printf、输出当前对象并将输出重定向到特定文件。

鉴于中的示例数据file.txt，这将发出以下内容：

printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

您可以将其重定向到一个单独的文件并运行它来sh执行命令，或者您可以eval直接在 shell 中使用：

eval "$( jq ...as above... )"

由于我们使用正确的 JSON 解析器，jq因此即使输入 JSON 文档的格式不是每行一个对象，上面的代码也能工作。

$ cat file.txt
{
  "full_name": "name1",
  "location_country": "united kingdom"
}
{
  "full_name": "name2",
  "location_country": "united states"
}
{
  "full_name": "name3",
  "location_country": "china"
}

$ jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt
printf "%s\n" '{"full_name":"name1","location_country":"united kingdom"}' >'united kingdom'.txt
printf "%s\n" '{"full_name":"name2","location_country":"united states"}' >'united states'.txt
printf "%s\n" '{"full_name":"name3","location_country":"china"}' >'china'.txt

$ eval "$( jq -r '"printf \"%s\\n\" \(. | @json | @sh) >\(.location_country|@sh).txt"' file.txt )"
$ ls
china.txt           file.txt            united kingdom.txt  united states.txt
$ cat 'united kingdom.txt'
{"full_name":"name1","location_country":"united kingdom"}

Question 2

使用awk

输入

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

输出

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Answer

使用awk

输入

$ cat input_file
{"full_name":"name1","location_country":"united kingdom"}
{"full_name":"name2","location_country":"united states"}
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

awk -F"[\"|:]" '$10~/[A-Za-z]/ {print > $10".txt"}' input_file

输出

$ cat china.txt
{"full_name":"name3","location_country":"china"}
{"full name":"name12","location":"china"}
{"full name":"name11","location":"china"}

$ cat united\ kingdom.txt
{"full_name":"name1","location_country":"united kingdom"}

$ cat united\ states.txt
{"full_name":"name2","location_country":"united states"}

Question 3

鉴于您在下面的评论，这应该可以使用 GNU awk 将第三个参数用于 match() 并处理许多同时打开的文件来完成您想要的操作：

awk 'match($0,/"location_country":"([^"]+)"/,a) { print > (a[1] ".txt") }' file

对于执行速度来说，装饰/排序/使用/取消装饰方法可能是最好的，例如：

awk -v OFS='"' 'match($0,/"location_country":"[^"]+"/) { print substr($0,RSTART+20,RLENGTH-21), $0 }' file |
sort -t'"' -k1,1 |
awk -F'"' '$1!=prev { close(out); out=$1 ".txt"; prev=$1 } { print > out }' |
cut -d'"' -f2-

这适用于任何排序、awk 和 cut。

原答案：

如果您的数据总是那么简单/规则，那么您所需要的就是使用 GNU awk （处理许多同时打开的输出文件）：

awk -F'"' '{ print > ($5 ".txt") }' file

或与任何 awk 一起使用：

awk -F'"' '{
    out = $5 ".txt"
    if ( !seen[out]++ ) {
        printf "" > out
    }
    print >> out
    close(out)
}' file

无论您的输入文件有多大，只要您有可用于创建输出文件的磁盘空间，上述方法都将起作用。

如果您愿意的话，可以通过首先对国家/地区名称进行排序来更有效地完成此操作：

sort -t'"' -k5,5 file |
awk -F'"' '$5 != prev{ close(out); out=$5 ".txt"; prev=$5 } { print > out }'

最后一个脚本适用于任何排序和任何 awk，但它可以重新排列每个国家/地区的输入行的顺序。如果你关心这一点并且有 GNU 排序，那么添加参数-s。如果您关心并且没有 GNU 排序，请告诉我，因为有一个非常简单的解决方法。

Answer