我需要将一些包含预测数据的 json 转换为 csv(用于 gnuplot)。尝试过几个 json2csv 实用程序
json样本:
{"cod":"200","message":0.006,"cnt":40,"list":[{"dt":1519333200,"main":{"temp":271.62,"temp_min":271.62,"temp_max":272.921,"pressure":1028.3,"sea_level":1037.2,"grnd_level":1028.3,"humidity":88,"temp_kf":-1.3},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.86,"deg":78.004},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-22 21:00:00"},{"dt":1519344000,"main":{"temp":271.22,"temp_min":271.22,"temp_max":272.193,"pressure":1028.11,"sea_level":1037.04,"grnd_level":1028.11,"humidity":100,"temp_kf":-0.98},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.52,"deg":80.0016},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-23 00:00:00"},
同样漂亮的印刷:
{
"cod": "200",
"message": 0.006,
"cnt": 40,
"list": [
{
"dt": 1519333200,
"main": {
"temp": 271.62,
"temp_min": 271.62,
"temp_max": 272.921,
"pressure": 1028.3,
"sea_level": 1037.2,
"grnd_level": 1028.3,
"humidity": 88,
"temp_kf": -1.3
},
"weather": [
{
"id": 800,
"main": "Clear",
"description": "clear sky",
"icon": "01n"
}
],
"clouds": {
"all": 0
},
"wind": {
"speed": 4.86,
"deg": 78.004
},
"rain": {},
"snow": {},
"sys": {
"pod": "n"
},
"dt_txt": "2018-02-22 21:00:00"
},
{
"dt": 1519344000,
"main": {
"temp": 271.22,
"temp_min": 271.22,
"temp_max": 272.193,
"pressure": 1028.11,
"sea_level": 1037.04,
"grnd_level": 1028.11,
"humidity": 100,
"temp_kf": -0.98
},
"weather": [
{
"id": 800,
"main": "Clear",
"description": "clear sky",
"icon": "01n"
}
],
"clouds": {
"all": 0
},
"wind": {
"speed": 4.52,
"deg": 80.0016
},
"rain": {},
"snow": {},
"sys": {
"pod": "n"
},
"dt_txt": "2018-02-23 00:00:00"
},
我设法使用 json2csv 来展平数据(并删除前几个不需要的字段):
1519333200 271.62 271.62 272.921 1028.3 1037.2 1028.3 88 -1.3 800 "Clear" "clear sky" "01n" 0 4.86 78.004 "{}" "{}" "n"
"2018-02-22 21:00:00" 1519344000 271.22 271.22 272.193 1028.11 1037.04 1028.11 100 -0.98 800 "Clear" "clear sky" "01n" 0 4.52 80.0016 "{}" "{}"
"n" "2018-02-23 00:00:00"
我需要在纯文本日期(sed??)之后创建新行,或者理想情况下将 json 直接展平为 csv
答案1
正则表达式sed
不太适合此类任务,但使用它相当简单jq
,一个JSON处理工具:
$ jq -r '.list[]|[.dt, .main[]] | @csv' < data.json
1519333200,271.62,271.62,272.921,1028.3,1037.2,1028.3,88,-1.3
1519344000,271.22,271.22,272.193,1028.11,1037.04,1028.11,100,-0.98
jq
接收 JSON 数据流并对其进行过滤以提取所需的数据,类似于sed
结构化数据而不是文本。该命令需要“列表”数组的每个元素(.list[]
)和取出“dt”字段(.dt
)的里面的物体以及“主”对象中的所有内容(.main[]
),将整个对象放入一个数组中,然后将其转换为 CSV 行(@csv
)。
如果您对特定字段感兴趣,或者想要对它们重新排序,您也可以将它们列出来:
$ jq -r '.list[]|[.dt, .main.temp_min, .main.humidity] | @csv' < data.json
1519333200,271.62,88
1519344000,271.22,100
只需列出您关心的每一项即可,忽略其余的。
如果您想了解数据的其他部分,您也可以将其放在那里:.clouds
将查看该clouds
字段,或者您可以使用以下命令读取风速.wind.speed
字段,或者您可以使用- 始终在名称之前这jq
还是比较全面的对于您需要的任何自定义行为。
答案2
只是对迈克尔·霍默(Michael Homer)出色回答的快速补充:
如果 jq 表达式变大,我们可以用它编写一个脚本:
例子:
$ cat mkweatherreport
#!/usr/bin/jq -rf
.list[] |
[
.dt,
.main.temp_min, # in Kelvin degrees
.main.humidity
] | @csv
进而:
$ mkweatherreport f.json
1519333200,271.62,88
1519344000,271.22,100
答案3
如果我们首先提取数组的元素list
,然后将它们转换为 CSV,这可能是最简单的。我们可以通过jq
(提取单个list
元素)和mlr
(以方便的方式将这些元素转换为 CSV)的组合来做到这一点:
$ jq '.list[]' file | mlr --j2c cat
dt,main.temp,main.temp_min,main.temp_max,main.pressure,main.sea_level,main.grnd_level,main.humidity,main.temp_kf,weather.1.id,weather.1.main,weather.1.description,weather.1.icon,clouds.all,wind.speed,wind.deg,rain,snow,sys.pod,dt_txt
1519333200,271.62,271.62,272.921,1028.3,1037.2,1028.3,88,-1.3,800,Clear,clear sky,01n,0,4.86,78.004,{},{},n,2018-02-22 21:00:00
1519344000,271.22,271.22,272.193,1028.11,1037.04,1028.11,100,-0.98,800,Clear,clear sky,01n,0,4.52,80.0016,{},{},n,2018-02-23 00:00:00
考虑到问题中的数据(更正以关闭数组),该调用jq
首先会产生相当于
{"dt":1519333200,"main":{"temp":271.62,"temp_min":271.62,"temp_max":272.921,"pressure":1028.3,"sea_level":1037.2,"grnd_level":1028.3,"humidity":88,"temp_kf":-1.3},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.86,"deg":78.004},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-22 21:00:00"}
{"dt":1519344000,"main":{"temp":271.22,"temp_min":271.22,"temp_max":272.193,"pressure":1028.11,"sea_level":1037.04,"grnd_level":1028.11,"humidity":100,"temp_kf":-0.98},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.52,"deg":80.0016},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-23 00:00:00"}
然后,对 Miller 的调用读取这些单独的对象,将它们展平,并将它们转换为 CSV (--j2c
意味着“读取 JSON,写入 CSV”,缩写(并且最近添加)--ijson --ocsv
)。
您可以修改表达式以删除或修改每个不需要的元素jq
中的字段。list
例如,要仅返回weather
子数组中的第一个元素,您可以使用.list[] | .weather |= first
.