管道后的两个连续 OP 或一次运行中的两个 jq OP?

管道后的两个连续 OP 或一次运行中的两个 jq OP?

我必须从格式稍有错误的 JSON 字符串中提取数据,因此我首先通过sed&传递它awk。我所拥有的是这样的命令:

`sed 's/},/},\n/g' test.json |awk '/"characater"/ { gsub("\"characater\"", "\"char" ++n "\"", $0) } 1'| jq -r '.frames.frame.lps.lp|.characters[]|[.code_ascii,.confidence]|@tsv'` 

从 JSON 字符串中提取数据,如下所示:

{"response":{"container":{"id":"41d6efcb-24d6-490d-8880-762255519b5f","timestamp":"2018-Jul-11 19:51:06.461665"},"id":"00000002-0000-0000-0000-000000000015"},"frames":{"frame":{"id":"5583","timestamp":"2016-Nov-30 13:05:27","lps":{"lp":{"licenseplate":"15451BBL","text":"15451BBL","wtext":"15451BBL","confidence":"20","bkcolor":"16777215","color":"16777215","type":"0","ntip":"11","cct_country_short":"","cct_state_short":"","tips":{"tip":{"poly":{"p":{"x":"1094","y":"643"},"p":{"x":"1099","y":"643"},"p":{"x":"1099","y":"667"},"p":{"x":"1094","y":"667"}},"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},"tip":{"poly":{"p":{"x":"1103","y":"642"},"p":{"x":"1113","y":"642"},"p":{"x":"1112","y":"667"},"p":{"x":"1102","y":"667"}},"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},"tip":{"poly":{"p":{"x":"1112","y":"640"},"p":{"x":"1122","y":"640"},"p":{"x":"1122","y":"666"},"p":{"x":"1112","y":"666"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},"tip":{"poly":{"p":{"x":"1123","y":"640"},"p":{"x":"1132","y":"640"},"p":{"x":"1131","y":"665"},"p":{"x":"1123","y":"665"}},"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},"tip":{"poly":{"p":{"x":"1134","y":"640"},"p":{"x":"1139","y":"640"},"p":{"x":"1139","y":"664"},"p":{"x":"1133","y":"664"}},"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},"tip":{"poly":{"p":{"x":"1154","y":"639"},"p":{"x":"1163","y":"639"},"p":{"x":"1163","y":"663"},"p":{"x":"1153","y":"663"}},"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},"tip":{"poly":{"p":{"x":"1164","y":"638"},"p":{"x":"1173","y":"638"},"p":{"x":"1173","y":"663"},"p":{"x":"1163","y":"663"}},"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},"tip":{"poly":{"p":{"x":"1191","y":"637"},"p":{"x":"1206","y":"636"},"p":{"x":"1205","y":"660"},"p":{"x":"1190","y":"661"}},"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"},"tip":{"poly":{"p":{"x":"1103","y":"655"},"p":{"x":"1111","y":"655"},"p":{"x":"1111","y":"667"},"p":{"x":"1103","y":"667"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},"tip":{"poly":{"p":{"x":"1103","y":"655"},"p":{"x":"1111","y":"655"},"p":{"x":"1111","y":"667"},"p":{"x":"1103","y":"667"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"57"},"tip":{"poly":{"p":{"x":"1176","y":"638"},"p":{"x":"1185","y":"637"},"p":{"x":"1184","y":"661"},"p":{"x":"1175","y":"662"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"7"}},"ncharacter":"8","characters":{"characater":{"poly":{"p":{"x":"1094","y":"643"},"p":{"x":"1099","y":"643"},"p":{"x":"1099","y":"667"},"p":{"x":"1094","y":"667"}},"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"97"},"characater":{"poly":{"p":{"x":"1103","y":"642"},"p":{"x":"1113","y":"642"},"p":{"x":"1112","y":"667"},"p":{"x":"1102","y":"667"}},"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"89"},"characater":{"poly":{"p":{"x":"1112","y":"640"},"p":{"x":"1122","y":"640"},"p":{"x":"1122","y":"666"},"p":{"x":"1112","y":"666"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"97"},"characater":{"poly":{"p":{"x":"1123","y":"640"},"p":{"x":"1132","y":"640"},"p":{"x":"1131","y":"665"},"p":{"x":"1123","y":"665"}},"bkcolor":"16777215","color":"0","code":"53","code_ascii":"5","confidence":"97"},"characater":{"poly":{"p":{"x":"1134","y":"640"},"p":{"x":"1139","y":"640"},"p":{"x":"1139","y":"664"},"p":{"x":"1133","y":"664"}},"bkcolor":"16777215","color":"0","code":"49","code_ascii":"1","confidence":"77"},"characater":{"poly":{"p":{"x":"1154","y":"639"},"p":{"x":"1163","y":"639"},"p":{"x":"1163","y":"663"},"p":{"x":"1153","y":"663"}},"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"97"},"characater":{"poly":{"p":{"x":"1164","y":"638"},"p":{"x":"1173","y":"638"},"p":{"x":"1173","y":"663"},"p":{"x":"1163","y":"663"}},"bkcolor":"16777215","color":"0","code":"66","code_ascii":"B","confidence":"94"},"characater":{"poly":{"p":{"x":"1191","y":"637"},"p":{"x":"1206","y":"636"},"p":{"x":"1205","y":"660"},"p":{"x":"1190","y":"661"}},"bkcolor":"16777215","color":"0","code":"76","code_ascii":"L","confidence":"34"}},"det_time_us":"1072592","poly":{"p":{"x":"1088","y":"642"},"p":{"x":"1210","y":"634"},"p":{"x":"1210","y":"661"},"p":{"x":"1087","y":"669"}}}},"det_time_us":"1720812"}}}

或在此链接上:https://drive.google.com/file/d/18wCzjMBpw7SIeVFByAGPQiqCBjg_0te3/view?usp=sharing
现在,效果很好,但我还需要.frames.frame.lps.lp.ncharacter从 JSON 中提取 。我知道我可以简单地执行类似cat test.json | jq -r '.frames.frame.lps.lp.ncharacter';上面的操作,但这行不通,因为我需要这些命令来解析一个巨大的 JSON 字符串文件,这些字符串的格式如链接中所示,并且我需要参数.ncharacter显示在行中与提取的字符意味着我想要一个类似的输出:

...
X       99
Y       99 previous data formatted in the same way
8
1       97
5       89
4       97
5       97
1       77
B       97
B       94
L       34
6          following data formatted in the same way
Z       99
...

其中顶部的 8 是.ncharacter参数。我努力了:

sed 's/},/},\n/g' test.json |awk '/"characater"/ { gsub("\"characater\"", "\"char" ++n "\"", $0) } 1'| jq -r '[.frames.frame.lps.lp.ncharacter],.frames.frame.lps.lp|.characters[]|[.code_ascii,.confidence]|@tsv'

但这给了我jq: error (at <stdin>:102): Cannot index array with string "characters",我不知道为什么......

答案1

检查一下:

第一个变体

perl -pe 's/"characater"/"\"char" . (++$n) . "\""/ge' input.json |
jq -r '.frames.frame.lps.lp|.ncharacter,(.characters[]|[.code_ascii,.confidence]|@tsv)'

解释

  1. perl -pe 's/"characater"/"\"char" . (++$n) . "\""/ge' input.json

    • -p- 循环遍历每一行并打印,就像sed.
    • -e- 可用于输入一行程序。如果-e给出,Perl 将不会在参数列表中查找文件名。
    • s///ge- g:全局替换,e:将替换命令的右侧部分计算为表达式。
    • "\"char" . (++$n) . "\""- 点用于连接。
  2. jq -r '.frames.frame.lps.lp|.ncharacter,(.characters[]|[.code_ascii,.confidence]|@tsv)'

    • .frames.frame.lps.lp|- 它可以写为.frames | .frame | .lps | .lp |,因此它的工作方式如下:获取输入,选择所有frames字段并将它们通过管道传输到另一个过滤器 - .frame,然后获取所有frame字段并将它们通过管道传输到下一个过滤器 - .lps,依此类推。看jq手册, 这管道部分。
    • |.ncharacter,(.characters[]|...)'-jq手册, 这逗号部分:“如果两个过滤器用逗号分隔,则相同的输入将被输入到两个过滤器中,并且两个过滤器的输出值流将按顺序连接:首先,左侧表达式产生的所有输出,然后是所有例如,filter.foo, .bar会生成“foo”字段和“bar”字段作为单独的输出。
    • (.characters[]|[.code_ascii,.confidence]|@tsv)- 括号用于.characters[]与过滤器输出分开处理的输出.ncharacter

第二种变体- 使用gawk代替perl文件json修复,该jq部分与第一个变体中相同:

gawk '{ORS= (RT) ? "\"char" NR "\"" : ""; print}' RS='"characater"' input.json

笔记-perlgawk命令不会char每帧重置块的计数器。也就是说,它从头开始char1并递增到最后。

输入- 您的样本重复了 3 次。

输出

8
1   97
5   89
4   97
5   97
1   77
B   97
B   94
L   34
8
1   97
5   89
4   97
5   97
1   77
B   97
B   94
L   34
8
1   97
5   89
4   97
5   97
1   77
B   97
B   94
L   34

答案2

这是同一主题的另一个问题的延续,这里的主要问题是输入包含具有非唯一键的对象。这仍然是有效的 JSON,但后面的键会覆盖前面的键,因此解析文档时数据会“丢失”。

我在这里回答了之前的问题,使用以下命令,该命令在答案中进行了解释:

$ jq -r -n --stream 'fromstream(1|truncate_stream(5|truncate_stream(inputs)|select(.[0][0] == "characater"))) | [.code_ascii, .confidence] | @tsv' test.json
1       97
5       89
4       97
5       97
1       77
B       97
B       94
L       34

问题在问题是输出需要在其本身的一行上输出的行数前面。这些格式奇怪的 JSON 文档不仅有一个实例,还有一整套,每行一个。

以下是上述命令的修改,它在输出之前将结果收集到数组中,以计算元素的数量:

$ jq -r -n --stream '[fromstream(1|truncate_stream(5|truncate_stream(inputs)|select(.[0][0] == "characater"))) | [.code_ascii, .confidence]] | length, (.[]|@tsv)' test.json
8
1       97
5       89
4       97
5       97
1       77
B       97
B       94
L       34

然后只需为原始文件中的每一行调用一次此命令即可:

#!/bin/bash

cmd=( jq -r -n --stream '[fromstream(1|truncate_stream(5|truncate_stream(inputs)|select(.[0][0] == "characater"))) | [.code_ascii, .confidence]] | length, (.[]|@tsv)' )

while IFS= read -r json; do
    printf '%s\n' "$json" | "${cmd[@]}"
done <test.json

相关内容