openweathermap json 转 csv

openweathermap json 转 csv

我需要将一些包含预测数据的 json 转换为 csv(用于 gnuplot)。尝试过几个 json2csv 实用程序

json样本:

{"cod":"200","message":0.006,"cnt":40,"list":[{"dt":1519333200,"main":{"temp":271.62,"temp_min":271.62,"temp_max":272.921,"pressure":1028.3,"sea_level":1037.2,"grnd_level":1028.3,"humidity":88,"temp_kf":-1.3},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.86,"deg":78.004},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-22 21:00:00"},{"dt":1519344000,"main":{"temp":271.22,"temp_min":271.22,"temp_max":272.193,"pressure":1028.11,"sea_level":1037.04,"grnd_level":1028.11,"humidity":100,"temp_kf":-0.98},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.52,"deg":80.0016},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-23 00:00:00"},

同样漂亮的印刷:

{
  "cod": "200",
  "message": 0.006,
  "cnt": 40,
  "list": [
    {
      "dt": 1519333200,
      "main": {
        "temp": 271.62,
        "temp_min": 271.62,
        "temp_max": 272.921,
        "pressure": 1028.3,
        "sea_level": 1037.2,
        "grnd_level": 1028.3,
        "humidity": 88,
        "temp_kf": -1.3
      },
      "weather": [
        {
          "id": 800,
          "main": "Clear",
          "description": "clear sky",
          "icon": "01n"
        }
      ],
      "clouds": {
        "all": 0
      },
      "wind": {
        "speed": 4.86,
        "deg": 78.004
      },
      "rain": {},
      "snow": {},
      "sys": {
        "pod": "n"
      },
      "dt_txt": "2018-02-22 21:00:00"
    },
    {
      "dt": 1519344000,
      "main": {
        "temp": 271.22,
        "temp_min": 271.22,
        "temp_max": 272.193,
        "pressure": 1028.11,
        "sea_level": 1037.04,
        "grnd_level": 1028.11,
        "humidity": 100,
        "temp_kf": -0.98
      },
      "weather": [
        {
          "id": 800,
          "main": "Clear",
          "description": "clear sky",
          "icon": "01n"
        }
      ],
      "clouds": {
        "all": 0
      },
      "wind": {
        "speed": 4.52,
        "deg": 80.0016
      },
      "rain": {},
      "snow": {},
      "sys": {
        "pod": "n"
      },
      "dt_txt": "2018-02-23 00:00:00"
    },

我设法使用 json2csv 来展平数据(并删除前几个不需要的字段):

1519333200 271.62 271.62 272.921 1028.3 1037.2 1028.3 88 -1.3 800 "Clear" "clear sky" "01n" 0 4.86 78.004 "{}" "{}" "n"
"2018-02-22 21:00:00" 1519344000 271.22 271.22 272.193 1028.11 1037.04 1028.11 100 -0.98 800 "Clear" "clear sky" "01n" 0 4.52 80.0016 "{}" "{}"
"n" "2018-02-23 00:00:00" 

我需要在纯文本日期(sed??)之后创建新行,或者理想情况下将 json 直接展平为 csv

答案1

正则表达式sed不太适合此类任务,但使用它相当简单jq,一个JSON处理工具:

$ jq -r '.list[]|[.dt, .main[]] | @csv' < data.json
1519333200,271.62,271.62,272.921,1028.3,1037.2,1028.3,88,-1.3
1519344000,271.22,271.22,272.193,1028.11,1037.04,1028.11,100,-0.98

jq接收 JSON 数据流并对其进行过滤以提取所需的数据,类似于sed结构化数据而不是文本。该命令需要“列表”数组的每个元素(.list[] )取出“dt”字段(.dt )里面的物体以及“主”对象中的所有内容(.main[] ),将整个对象放入一个数组中,然后将其转换为 CSV 行(@csv )


如果您对特定字段感兴趣,或者想要对它们重新排序,您也可以将它们列出来:

$ jq -r '.list[]|[.dt, .main.temp_min, .main.humidity] | @csv' < data.json
1519333200,271.62,88
1519344000,271.22,100

只需列出您关心的每一项即可,忽略其余的。


如果您想了解数据的其他部分,您也可以将其放在那里:.clouds将查看该clouds字段,或者您可以使用以下命令读取风速.wind.speed字段,或者您可以使用- 始终在名称之前jq还是比较全面的对于您需要的任何自定义行为。

答案2

只是对迈克尔·霍默(Michael Homer)出色回答的快速补充:

如果 jq 表达式变大,我们可以用它编写一个脚本:

例子:

$ cat mkweatherreport
#!/usr/bin/jq -rf
.list[] |
  [
     .dt,
     .main.temp_min,        # in Kelvin degrees
     .main.humidity
  ] | @csv

进而:

$ mkweatherreport f.json
1519333200,271.62,88
1519344000,271.22,100

答案3

如果我们首先提取数组的元素list,然后将它们转换为 CSV,这可能是最简单的。我们可以通过jq(提取单个list元素)和mlr(以方便的方式将这些元素转换为 CSV)的组合来做到这一点:

$ jq '.list[]' file | mlr --j2c cat
dt,main.temp,main.temp_min,main.temp_max,main.pressure,main.sea_level,main.grnd_level,main.humidity,main.temp_kf,weather.1.id,weather.1.main,weather.1.description,weather.1.icon,clouds.all,wind.speed,wind.deg,rain,snow,sys.pod,dt_txt
1519333200,271.62,271.62,272.921,1028.3,1037.2,1028.3,88,-1.3,800,Clear,clear sky,01n,0,4.86,78.004,{},{},n,2018-02-22 21:00:00
1519344000,271.22,271.22,272.193,1028.11,1037.04,1028.11,100,-0.98,800,Clear,clear sky,01n,0,4.52,80.0016,{},{},n,2018-02-23 00:00:00

考虑到问题中的数据(更正以关闭数组),该调用jq首先会产生相当于

{"dt":1519333200,"main":{"temp":271.62,"temp_min":271.62,"temp_max":272.921,"pressure":1028.3,"sea_level":1037.2,"grnd_level":1028.3,"humidity":88,"temp_kf":-1.3},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.86,"deg":78.004},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-22 21:00:00"}
{"dt":1519344000,"main":{"temp":271.22,"temp_min":271.22,"temp_max":272.193,"pressure":1028.11,"sea_level":1037.04,"grnd_level":1028.11,"humidity":100,"temp_kf":-0.98},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.52,"deg":80.0016},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-23 00:00:00"}

然后,对 Miller 的调用读取这些单独的对象,将它们展平,并将它们转换为 CSV (--j2c意味着“读取 JSON,写入 CSV”,缩写(并且最近添加)--ijson --ocsv)。

您可以修改表达式以删除或修改每个不需要的元素jq中的字段。list例如,要仅返回weather子数组中的第一个元素,您可以使用.list[] | .weather |= first.

相关内容