我是个初学者,一生中从未做过 JSon 只解析了一些 json 文本文件:
类别数量可能会变化
Category1
1111|2222|3333|444|5555
6666|7777|8888|9999|1111
0000|1111|22222|3333|4444
Category2
2222|3333||5555
2222|3333|4444||5555
Category3
AAAA|||BBBB
CCCC|DDDD|EEEE|FFFF|GGGG
预期的 JSON:类别 ---> 数组 ---> 对于每个 tsv -->Make Key:Value(Year|Title|Name|Format|Base) 如果 tsv 值不适合以上任何人,则添加 null Json 应该是这样的,我可以对每个类别进行数字索引并重新索引类别以获得 tsv 数组
Json 结构应该是这样的: 例如。
1) I could get list of category using jq
cat file | jq
Output:
Category1
Category2
Category3
2) Index that Category
jq '.| Category[0]'
Output:
Category1
1111|2222|3333|444|5555
6666|7777|8888|9999|1111
0000|1111|22222|3333|4444
3) Index each tsv
jq Category1 | jq '.[0]' --->
{
"Year": "1111"
"Title": "2222"
"Name": "3333"
"Format": "4444"
"Base": "5555"
}
jq'.|类别[0]
答案1
我不太知道您期望最终的 JSON 文档是什么样子,但以下内容使用 Miller ( mlr
) 和,根据您的 CSV 输入文件创建了一个 JSON 文档jq
:
mlr --c2j -N --ifs pipe --ragged \
label Year,Title,Name,Format,Base then \
put '
if (NF == 1) {
@Category = string($Year)
} else {
$Category = @Category
}' then \
filter -x 'is_absent($Category)' then \
unsparsify file.csv |
jq -n 'reduce inputs as $a ({}; .[$a.Category] += [$a])'
这首先按照您在问题中提到的顺序添加您提到的标题。Year
如果记录中只有一个字段,它会记住该字段的值(第一个字段)。Category
如果有多个字段,则使用记住的值创建该字段。
然后,我们删除所有没有该Category
字段的记录,最后向任何缺失的字段添加空值。
我们现在有一组 JSON 对象:
{ "Year": 1111, "Title": 2222, "Name": 3333, "Format": 444, "Base": 5555, "Category": "Category1" }
{ "Year": 6666, "Title": 7777, "Name": 8888, "Format": 9999, "Base": 1111, "Category": "Category1" }
{ "Year": 0000, "Title": 1111, "Name": 22222, "Format": 3333, "Base": 4444, "Category": "Category1" }
{ "Year": 2222, "Title": 3333, "Name": "", "Format": 5555, "Base": "", "Category": "Category2" }
{ "Year": 2222, "Title": 3333, "Name": 4444, "Format": "", "Base": 5555, "Category": "Category2" }
{ "Year": "AAAA", "Title": "", "Name": "", "Format": "BBBB", "Base": "", "Category": "Category3" }
{ "Year": "CCCC", "Title": "DDDD", "Name": "EEEE", "Format": "FFFF", "Base": "GGGG", "Category": "Category3" }
然后由 进行处理jq
,它将根据字段中的值将对象收集到数组中Category
。
根据给定的数据,这会产生一些东西相当于
{
"Category1": [
{ "Base": 5555, "Category": "Category1", "Format": 444, "Name": 3333, "Title": 2222, "Year": 1111 },
{ "Base": 1111, "Category": "Category1", "Format": 9999, "Name": 8888, "Title": 7777, "Year": 6666 },
{ "Base": 4444, "Category": "Category1", "Format": 3333, "Name": 22222, "Title": 1111, "Year": 0 }
],
"Category2": [
{ "Base": "", "Category": "Category2", "Format": 5555, "Name": "", "Title": 3333, "Year": 2222 },
{ "Base": 5555, "Category": "Category2", "Format": "", "Name": 4444, "Title": 3333, "Year": 2222 }
],
"Category3": [
{ "Base": "", "Category": "Category3", "Format": "BBBB", "Name": "", "Title": "", "Year": "AAAA" },
{ "Base": "GGGG", "Category": "Category3", "Format": "FFFF", "Name": "EEEE", "Title": "DDDD", "Year": "CCCC" }
]
}
通过这个jq '.Category1[0]'
会给你
{
"Year": 1111,
"Title": 2222,
"Name": 3333,
"Format": 444,
"Base": 5555,
"Category": "Category1"
}
答案2
对于标准工具来说,您的数据中有太多自由形式。我建议使用 Perl,例如:
#!/bin/perl
use JSON::PP;
open(IN, "<file.tsv");
while(<IN>) {
chomp;
if (/Category1/) {
$group = $_;
@heads = ("Year", "Title", "Name", "Format", "Base");
} elsif (/Category2/) {
$group = "new name for Category2";
@heads = ("Year", "Title", "Name");
} elsif (/Category3/) {
$group = $_;
@heads = ("ABC", "EFD");
} else {
my @columns = split /\|/;
my %row = ();
for (my $i=0; $i < scalar(@heads); $i++) {
$row{$heads[$i]} = $columns[$i];
}
push @{$data{$group}}, \%row;
}
}
close(IN);
print encode_json \%data;
答案3
这是使用csplit
和执行此操作的一种方法jq
。
您可以将数据拆分到临时文件中,如下所示:
csplit -z infile.csv '/^Category/' '{*}'
将文件转换xx*
为 JSON 对象:
for f in xx*; do
<$f jq -sRrc '
split("\n")
| .[0] as $category
| .[1:-1]
| map(split("|"))
| map({"Year" : .[0],
"Title" : .[1],
"Name" : .[2],
"Format": .[3],
"Base" : .[4],
}) | { ($category): . }
'
done
输出:
{"Category1":[{"Year":"1111","Title":"2222","Name":"3333","Format":"444","Base":"5555"},{"Year":"6666","Title":"7777","Name":"8888","Format":"9999","Base":"1111"},{"Year":"0000","Title":"1111","Name":"22222","Format":"3333","Base":"4444"}]}
{"Category2":[{"Year":"2222","Title":"3333","Name":"","Format":"5555","Base":null},{"Year":"2222","Title":"3333","Name":"4444","Format":"","Base":"5555"}]}
{"Category3":[{"Year":"AAAA","Title":"","Name":"","Format":"BBBB","Base":null},{"Year":"CCCC","Title":"DDDD","Name":"EEEE","Format":"FFFF","Base":"GGGG"}]}
您可以获得这样的类别:
jq -r 'keys[]'
输出:
Category1
Category2
Category3
如果您想要特定类别,例如:
n=2; cat xx$(printf "%02d" $((n-1)))
输出:
Category2
2222|3333||5555
2222|3333|4444||5555
按索引获取特定条目,例如第二个对象Category2
:
jq '.Category2 // empty | .[1]'
输出:
{
"Year": "2222",
"Title": "3333",
"Name": "4444",
"Format": "",
"Base": "5555"
}
答案4
#!/usr/bin/perl
use JSON;
use strict;
my @fields = qw(Year Title Name Format Base);
my (%data, $category, $row);
while(<>) {
chomp;
my @line = split /\|/;
if ($#line == 0) {
# there are no field separators on this line, so
# this must be a "category" line.
$category = $_;
$row = 0;
} else {
foreach my $i (0..$#fields) {
# treat empty strings as null values
undef $line[$i] if ($line[$i] eq "");
$data{$category}[$row]{$fields[$i]} = $line[$i]
};
$row++;
};
};
my $json = JSON->new->pretty->canonical;
print $json->encode(\%data) ;
顺便说一句,“pretty”在 json 字符串中启用换行和缩进。 'canonical' 导致键按排序顺序输出(否则每次运行时它们都会以半随机顺序输出,因为 perl 哈希本质上是无序的)。默认情况下,两者均处于禁用状态JSON模块,因为它们会产生额外的处理开销(即它们会减慢速度),并且仅在人类要读取输出时才需要 - 处理 json 数据的程序不关心键的顺序。
使用示例数据作为输入进行输出:
{
"Category1" : [
{
"Base" : "5555",
"Format" : "444",
"Name" : "3333",
"Title" : "2222",
"Year" : "1111"
},
{
"Base" : "1111",
"Format" : "9999",
"Name" : "8888",
"Title" : "7777",
"Year" : "6666"
},
{
"Base" : "4444",
"Format" : "3333",
"Name" : "22222",
"Title" : "1111",
"Year" : "0000"
}
],
"Category2" : [
{
"Base" : null,
"Format" : "5555",
"Name" : null,
"Title" : "3333",
"Year" : "2222"
},
{
"Base" : "5555",
"Format" : null,
"Name" : "4444",
"Title" : "3333",
"Year" : "2222"
}
],
"Category3" : [
{
"Base" : null,
"Format" : "BBBB",
"Name" : null,
"Title" : null,
"Year" : "AAAA"
},
{
"Base" : "GGGG",
"Format" : "FFFF",
"Name" : "EEEE",
"Title" : "DDDD",
"Year" : "CCCC"
}
]
}