如何从非常大(> 100,000 行)的 JSON 文件中提取每个节点的两个数据字段(1 个标量和 1 个数组)?

如何从非常大(> 100,000 行)的 JSON 文件中提取每个节点的两个数据字段(1 个标量和 1 个数组)?

我有一个 139,000 行的 JSON 文件,其结构基本上如下所示(它是从 OpenStreetMap 中提取的):

{
  "type": "FeatureCollection",
  "generator": "overpass-ide",
  "features": [
    {
      "type": "Feature",
      "properties": {
        "@id": "relation/7859",
        "TMC:cid_58:tabcd_1:Class": "Area",
        "TMC:cid_58:tabcd_1:LCLversion": "9.00",
        "TMC:cid_58:tabcd_1:LocationCode": "4934",
        "leisure": "park",
        "name": "Platnersberg",
        "type": "multipolygon",
        "@geometry": "center"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          11.128184,
          49.4706035
        ]
      },
      "id": "relation/7859"
    },
    {
      "type": "Feature",
      "properties": {
        "@id": "relation/62370",
        "TMC:cid_58:tabcd_1:Class": "Area",
        "TMC:cid_58:tabcd_1:LCLversion": "8.00",
        "TMC:cid_58:tabcd_1:LocationCode": "1157",
        "admin_level": "6",
        "boundary": "administrative",
        "de:place": "city",
        "name": "Eisenach",
        "type": "boundary",
        "@geometry": "center"
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          10.2836229,
          50.9916015
        ]
      },
      "id": "relation/62370"
    }
  ]
}

不是 我想获取此文件中每个要素的名称、TMC 位置代码和坐标,最好是 CSV 文件:

location_code,name,latitude,longitude

我知道我可以制作一个正则表达式,它会剔除所有多余的节点,但这将是一个相当复杂的过程。我还在jqOpenSuSE Leap 15.1 机器上安装了该工具,但对于该工具我还是个新手。

关于如何完成这项提取工作有什么想法吗?

答案1

我自己也是个新手,但我认为

$ jq -r '.features[] | select(.type == "Feature") | [.properties."TMC:cid_58:tabcd_1:LocationCode",.properties.name,.geometry.coordinates[]] | @csv' file.json
"4934","Platnersberg",11.128184,49.4706035
"1157","Eisenach",10.2836229,50.9916015

应该这样做。过滤select(.type == "Feature")器可能不是必需的 - 我不确定是否可以使用任何其他类型。

相关内容