通过 jq 连接数千个巨大的 json 文件

通过 jq 连接数千个巨大的 json 文件

我有数千个 JSON 文件,如下所示:

文件1 ( key1:value_list1)

{"2mac:acg":["1-248","3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43"]}

文件2 ( key2:value_list2)

{"4qld:aaa":["3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43","19-55"]}

文件3 ( key3:value_list3)

{"6k8h:c":["1-248","2-134","3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43"]}

我想将这些文件合并为一个,它应该如下所示:

{"2mac:acg":["1-248","3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43"], "4qld:aaa":["3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43","19-55"], "6k8h:c":["1-248","2-134","3-245","3-246","4-245","4-246","5-245","5-246","6-243","6-245","6-246","6-247","6-296","7-245","7-295","7-296","8-236","8-239","8-240","8-294","8-295","8-296","9-235","9-236","9-239","9-294","10-293","10-294","10-295","11-15","11-16","11-293","11-294","12-16","12-290","12-291","12-292","12-293","12-294","13-25","13-26","13-27","13-28","13-290","13-292","13-293","14-24","14-25","14-26","14-27","14-290","15-24","15-25","16-24","16-25","16-233","16-234","16-235","17-22","17-23","17-24","17-25","17-59","17-233","17-234","17-235","18-22","18-23","18-24","18-25","18-43","18-213","18-214","18-215","18-229","18-230","18-232","18-233","18-234","19-42","19-43"]}

连接模型应该是{key1:value_list_1, key2:value_list2, key3:value_list3,...,key_last:value_list_last}

感谢@thanasisp,我使用 jq 通过 jq -s 'add' file1 file2 file3 连接它们。当连接数百个文件时它效果很好。但如果有数千个文件,它就不起作用并回复错误消息:参数列表太长!所以我想知道如何解决这个问题以及是否有其他方法来处理它。谢谢! PS:服务器有足够的内存。

答案1

jq -c -s add file*

这会将file*与该模式匹配的所有文件读取到jq. -s( )选项--slurp导致从所有输入文件创建单个数组。这个大数组的每个元素都是来自其中一个文件的一个对象。数组元素组合在一起add形成一个对象。

-c选项可以jq产生“紧凑”的输出。

如果文件太多,shell 将由于超出命令行允许的最大长度而无法执行命令。

如果发生这种情况,您可以find创建 JSON 对象流以供jq命令处理。

find . -name '*.json' -type f -exec cat {} + | jq -c -s add >final

它使用catfrom从输入文件(名称以当前目录或当前目录结尾的find任何常规文件)创建 JSON 对象流。.jsonjq命令将它们收集到一个数组中,然后像以前一样将其组合成一个对象。最终结果输出到文件中final

请注意,如果键之间存在冲突(两个或多个文件中的相同键),则找到的最后一个键及其值将覆盖前一个键及其值。

答案2

听起来您甚至不需要 jq,只需将除最后一个文件之外的所有文件中的尾随替换为},并删除除第一个文件之外的所有文件中的,前导。{

zsh

autoload zargs
files=( *.json(Nn) )            # here sorted numerically or:
files=( ${(f)"$(<file.list)"} ) # to read the list one per line from
                                # a file.list
case $#files in
  (1) cat -- $files;;
  (<2->)
    sed -- 's/}$/,/' $files[1]
    zargs -r -- $files[2,-2] -- sed -- 's/^{//; s/}$/,/'
    sed -- 's/^{//' $files[-1]
esac > result.json

(如果您希望结果在一行上,请|paste -sd '\0' -在前面插入)。>

或者首先连接它们并对结果进行替换,这里使用与 ksh(至少是那些内置的变体printf)、zsh、yash 或 bash 兼容的语法,但假设 GNUxargs或兼容:

printf '%s\0' "${files[@]}" |
  xargs -r0 cat -- |
  sed '1!s/^{//; $!s/}$/,/' > result.json

假设输入 json 文件具有正确分隔的行。

相关内容