如何将有效的 JSON 转换为 CSV？

Question 1

您可以使用以下代码在一行中将此 JSON 转换为 CSVjq。

jq '.data.headers | [.sender, .to, .subject, ."x-received-time", 
.received, .from, .date, .id, .to, .subject, .fromfull] 
+ [(.time | tostring)] | join(", ")'

分解：

.data.headers- 将标题作为对象发出
- 如果数据包含一个标题数组，那么.data[].headers
[…string keys list…]- 将字符串值作为数组发出
+ [(.time | tostring)]- 将时间作为字符串发出并添加到数组中
join(", ")- 使用逗号和空格连接数组值
- 在这里替换您喜欢的分隔符

2022 年更新：

jq 支持@csv(逗号分隔值) 或@tsv(制表符分隔值) 格式化程序。上面的代码也可以写成：

jq -r '.data.headers | [.sender, .to, .subject, ."x-received-time", 
.received, .from, .date, .id, .to, .subject, .fromfull] 
+ [(.time | tostring)] | @csv'

Answer

您可以使用以下代码在一行中将此 JSON 转换为 CSVjq。

jq '.data.headers | [.sender, .to, .subject, ."x-received-time", 
.received, .from, .date, .id, .to, .subject, .fromfull] 
+ [(.time | tostring)] | join(", ")'

分解：

.data.headers- 将标题作为对象发出
- 如果数据包含一个标题数组，那么.data[].headers
[…string keys list…]- 将字符串值作为数组发出
+ [(.time | tostring)]- 将时间作为字符串发出并添加到数组中
join(", ")- 使用逗号和空格连接数组值
- 在这里替换您喜欢的分隔符

2022 年更新：

jq 支持@csv(逗号分隔值) 或@tsv(制表符分隔值) 格式化程序。上面的代码也可以写成：

jq -r '.data.headers | [.sender, .to, .subject, ."x-received-time", 
.received, .from, .date, .id, .to, .subject, .fromfull] 
+ [(.time | tostring)] | @csv'

Question 2

您可以使用以下 perl 命令创建 CSV 输出，打开终端并输入：

perl -n0e '@a= $_ =~ /"date":(".*?").*?"id":(".*?").*?"to":"(.*?)".*?".*?"subject":(".*?").*?"fromfull":"(.*?)"/gs;  while (my @next_n = splice @a, 0, 5) { print join(q{,}, @next_n)."\n"}' inputfile.txt

即使您的输入文件有多个标题，它也能起作用。

请注意，只考虑最后一个“to”：字段（似乎您的标题提供了两次信息）

命令输出：

"Mon, 27 Oct 2014 09:03:14 -0500","1414427328-2345855-frank",[email protected],"Help with this project",[email protected]

Answer

您可以使用以下 perl 命令创建 CSV 输出，打开终端并输入：

perl -n0e '@a= $_ =~ /"date":(".*?").*?"id":(".*?").*?"to":"(.*?)".*?".*?"subject":(".*?").*?"fromfull":"(.*?)"/gs;  while (my @next_n = splice @a, 0, 5) { print join(q{,}, @next_n)."\n"}' inputfile.txt

即使您的输入文件有多个标题，它也能起作用。

请注意，只考虑最后一个“to”：字段（似乎您的标题提供了两次信息）

命令输出：

"Mon, 27 Oct 2014 09:03:14 -0500","1414427328-2345855-frank",[email protected],"Help with this project",[email protected]

Question 3

既然你正在处理 JSON 文件，为什么不这样解析它呢？安装nodejs-legacy并创建一个 NodeJS 脚本，例如：

#!/usr/bin/env node
// parseline.js process lines one by one
'use strict';
var readline = require('readline');
var rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
  terminal: false
});

rl.on('line', function(line){
    var obj = JSON.parse(line);
    // add the fields which you want to extract here:
    var fields = [
        obj.data.headers.to,
        obj.data.headers.subject,
        // etc.
    ];
    // print the fields, joined by a comma (CSV, duh.)
    // No escaping is done, so if the subject contains ',',
    // then you need additional post-processing.
    console.log(fields.join(','));
});

假设文件的每一行都有一个有效的 JSON 字符串：

node parseline.js < some.txt

或者如果您确实想读取单个文件并从中解析字段：

#!/usr/bin/env node
// parsefile.js - fully read file and parse some data out of it
'use strict';
var filename = process.argv[1]; // first argument
var fs = require('fs');
var text = fs.readFileSync(filename).toString();
var obj = JSON.parse(text);
// add the fields which you want to extract here:
var fields = [
    obj.data.headers.to,
    obj.data.headers.subject,
    // etc.
];
// print the fields, joined by a comma (CSV, duh.)
// No escaping is done, so if the subject contains ',',
// then you need additional post-processing.
console.log(fields.join(','));

然后运行：

node parsefile.js yourfile.json > yourfile.csv

Answer

既然你正在处理 JSON 文件，为什么不这样解析它呢？安装nodejs-legacy并创建一个 NodeJS 脚本，例如：

#!/usr/bin/env node
// parseline.js process lines one by one
'use strict';
var readline = require('readline');
var rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
  terminal: false
});

rl.on('line', function(line){
    var obj = JSON.parse(line);
    // add the fields which you want to extract here:
    var fields = [
        obj.data.headers.to,
        obj.data.headers.subject,
        // etc.
    ];
    // print the fields, joined by a comma (CSV, duh.)
    // No escaping is done, so if the subject contains ',',
    // then you need additional post-processing.
    console.log(fields.join(','));
});

假设文件的每一行都有一个有效的 JSON 字符串：

node parseline.js < some.txt

或者如果您确实想读取单个文件并从中解析字段：

#!/usr/bin/env node
// parsefile.js - fully read file and parse some data out of it
'use strict';
var filename = process.argv[1]; // first argument
var fs = require('fs');
var text = fs.readFileSync(filename).toString();
var obj = JSON.parse(text);
// add the fields which you want to extract here:
var fields = [
    obj.data.headers.to,
    obj.data.headers.subject,
    // etc.
];
// print the fields, joined by a comma (CSV, duh.)
// No escaping is done, so if the subject contains ',',
// then you need additional post-processing.
console.log(fields.join(','));

然后运行：

node parsefile.js yourfile.json > yourfile.csv

Question 4

这是我刚刚为您编写的 gawk 脚本！

#!/usr/bin/gawk -f
BEGIN {
  FS="\""
  output=""
  nodata=1
}

/^"data"/{
  if( ! nodata )
  {
    gsub("|$","",output)
    print output
    nodata=0
  }
  output=""
}

/^"[^d][^a][^t][^a]/{
  if ( $2 == "to" || $2 == "fromfull" || $2 == "id" || $2 == "subject" || $2 == "date" )
    output=output$4"|"
}

END{
  gsub("|$","",output)
  print output
}

它应该适用于包含一堆类似条目的文件。如果您想将其他项目添加到列表中，只需将它们添加到 if 语句中即可。不过，我确实发现了您的数据集的一个问题：日期。它们包含逗号，因此它不是真正的 CSV。相反，我只是用另一个字符将其分隔开。

Answer

这是我刚刚为您编写的 gawk 脚本！

#!/usr/bin/gawk -f
BEGIN {
  FS="\""
  output=""
  nodata=1
}

/^"data"/{
  if( ! nodata )
  {
    gsub("|$","",output)
    print output
    nodata=0
  }
  output=""
}

/^"[^d][^a][^t][^a]/{
  if ( $2 == "to" || $2 == "fromfull" || $2 == "id" || $2 == "subject" || $2 == "date" )
    output=output$4"|"
}

END{
  gsub("|$","",output)
  print output
}

它应该适用于包含一堆类似条目的文件。如果您想将其他项目添加到列表中，只需将它们添加到 if 语句中即可。不过，我确实发现了您的数据集的一个问题：日期。它们包含逗号，因此它不是真正的 CSV。相反，我只是用另一个字符将其分隔开。

如何将有效的 JSON 转换为 CSV？

答案1

2022 年更新：

答案2

答案3

答案4

相关内容