在 shell 上将文本文件转换为 CSV

在 shell 上将文本文件转换为 CSV

我想将文本文件转换为 CSV 格式。

以下是该文件的摘录:

{"Outdated":false,"Watt":233,"Timestamp":1669647142,"A_Plus":6523.896,"A_Plus_HT":4494.82,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":235,"Timestamp":1669647152,"A_Plus":6523.896,"A_Plus_HT":4494.82,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":235,"Timestamp":1669647596,"A_Plus":6523.896,"A_Plus_HT":4494.82,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":233,"Timestamp":1669651191,"A_Plus":6524.496,"A_Plus_HT":4495.42,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":276,"Timestamp":1669654797,"A_Plus":6524.816,"A_Plus_HT":4495.74,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":437,"Timestamp":1669658393,"A_Plus":6525.901,"A_Plus_HT":4496.825,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":362,"Timestamp":1669661992,"A_Plus":6526.732,"A_Plus_HT":4497.656,"A_Plus_NT":2029.076,"A_Minus":8841.63}
{"Outdated":false,"Watt":471,"Timestamp":1669665603,"A_Plus":6527.062,"A_Plus_HT":4497.986,"A_Plus_NT":2029.076,"A_Minus":8841.63}

结构类似于 JSON 文件,但不幸的是结构并不完全匹配。我已经使用过此工具(https://github.com/alingse/jsoncsv),但不幸的是,由于结构错误,我收到了错误消息。

我怎样才能将其转换为 CSV 文件?

感谢您的帮助!

答案1

输入的每一行都是一个有效的 JSON 对象 - 因此如果您不关心 CSV 标题,您可以简单地将它们解构为数组并将它们传递给过滤@csv器:

$ jq -r '[.[]] | @csv' file
false,233,1669647142,6523.896,4494.82,2029.076,8841.63
false,235,1669647152,6523.896,4494.82,2029.076,8841.63
false,235,1669647596,6523.896,4494.82,2029.076,8841.63
false,233,1669651191,6524.496,4495.42,2029.076,8841.63
false,276,1669654797,6524.816,4495.74,2029.076,8841.63
false,437,1669658393,6525.901,4496.825,2029.076,8841.63
false,362,1669661992,6526.732,4497.656,2029.076,8841.63
false,471,1669665603,6527.062,4497.986,2029.076,8841.63

如果你关心标题,这更复杂。我能想到的最好的办法是:

  • 啜饮整个文件变成一个索引行数组
  • 对行数组进行缩减,并使用从第 0元素中提取的键对其进行初始化

现在您有一个数组的数组,其中顶部有一个标题数组,它可以映射回 CSV 数组(最后映射回单独的 CSV 行):

$ jq -r --slurp 'to_entries |
    reduce . as $row ([.[0] | .value | keys_unsorted]; . + [$row[] | .value | to_entries | map(.value)]) |
    map(@csv) | .[]
  ' file
"Outdated","Watt","Timestamp","A_Plus","A_Plus_HT","A_Plus_NT","A_Minus"
false,233,1669647142,6523.896,4494.82,2029.076,8841.63
false,235,1669647152,6523.896,4494.82,2029.076,8841.63
false,235,1669647596,6523.896,4494.82,2029.076,8841.63
false,233,1669651191,6524.496,4495.42,2029.076,8841.63
false,276,1669654797,6524.816,4495.74,2029.076,8841.63
false,437,1669658393,6525.901,4496.825,2029.076,8841.63
false,362,1669661992,6526.732,4497.656,2029.076,8841.63
false,471,1669665603,6527.062,4497.986,2029.076,8841.63

答案2

这是一种非常常用的数据格式,Json 行

您可以使用米勒 6然后运行

mlrgo --ijsonl --ocsv cat input.jsonl

要得到

Outdated,Watt,Timestamp,A_Plus,A_Plus_HT,A_Plus_NT,A_Minus
false,233,1669647142,6523.896,4494.82,2029.076,8841.63
false,235,1669647152,6523.896,4494.82,2029.076,8841.63
false,235,1669647596,6523.896,4494.82,2029.076,8841.63
false,233,1669651191,6524.496,4495.42,2029.076,8841.63
false,276,1669654797,6524.816,4495.74,2029.076,8841.63
false,437,1669658393,6525.901,4496.825,2029.076,8841.63
false,362,1669661992,6526.732,4497.656,2029.076,8841.63
false,471,1669665603,6527.062,4497.986,2029.076,8841.63

在下面有用的评论中,有一条关于在 Miller 5 中执行此操作的方法的注释:

mlr --ijson --ocsv cat input >output

答案3

json对我来说,首先转换为正确的,然后使用解析器(例如jqjsoncsv您已经尝试过的)而不是编写自己的解析器似乎更容易和更安全。

[通过简单地添加第一行并在]最后一行后附加以及在除最后一行之外的每一行后面添加逗号来转换为 json 。

sed '1s/^/[/;$!s/$/,/;$a]' file

然后转换成csv使用jq

jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | @csv'

并把它们放在一起:

$ sed '1s/^/[/;$!s/$/,/;$a]' file | jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | @csv'
"Outdated","Watt","Timestamp","A_Plus","A_Plus_HT","A_Plus_NT","A_Minus"
false,233,1669647142,6523.896,4494.82,2029.076,8841.63
false,235,1669647152,6523.896,4494.82,2029.076,8841.63
false,235,1669647596,6523.896,4494.82,2029.076,8841.63
false,233,1669651191,6524.496,4495.42,2029.076,8841.63
false,276,1669654797,6524.816,4495.74,2029.076,8841.63
false,437,1669658393,6525.901,4496.825,2029.076,8841.63
false,362,1669661992,6526.732,4497.656,2029.076,8841.63
false,471,1669665603,6527.062,4497.986,2029.076,8841.63

答案4

您在示例中提供的数据类型称为换行符分隔的 JSON又名尼德桑)...有一些工具本身支持它,例如in2csv工具包...所以首先像这样安装它:

sudo apt install csvkit

然后,使用选项指定格式-f ndjson,并在文件中简单地使用它,如下所示:

in2csv -f ndjson file

这应该导致:

Outdated,Watt,Timestamp,A_Plus,A_Plus_HT,A_Plus_NT,A_Minus
False,233,1669647142,6523.896,4494.82,2029.076,8841.63
False,235,1669647152,6523.896,4494.82,2029.076,8841.63
False,235,1669647596,6523.896,4494.82,2029.076,8841.63
False,233,1669651191,6524.496,4495.42,2029.076,8841.63
False,276,1669654797,6524.816,4495.74,2029.076,8841.63
False,437,1669658393,6525.901,4496.825,2029.076,8841.63
False,362,1669661992,6526.732,4497.656,2029.076,8841.63
False,471,1669665603,6527.062,4497.986,2029.076,8841.63

相关内容