将 xml 转换为 json 的脚本

Question 1

事实上，即使没有 Python 编程，你也可以摆脱这里，只需使用 2 个 unix 实用程序：

jtm- 允许 xml <-> json 无损转换
jtc- 允许操作 JSON

因此，假设您的 xml 位于中file.xml，jtm 会将其转换为以下 json：

bash $ jtm file.xml 
[
   {
      "quiz": [
         {
            "que": "The question her"
         },
         {
            "ca": "text"
         },
         {
            "ia": "text"
         },
         {
            "ia": "text"
         },
         {
            "ia": "text"
         }
      ]
   }
]
bash $

然后，应用一系列 JSON 转换，您可以得到所需的结果：

bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo { '"answer[-]"': {} }\; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
   {
      "answer1": "text",
      "answer2": "text",
      "answer3": "text",
      "answer4": "text",
      "text": "The question her"
   }
]
bash $

不过，由于涉及 shell 脚本（echo命令），它会比 Python 慢 - 对于 5000 个问题，我预计它会运行大约一分钟。（在未来的版本中，jtc我计划甚至在静态指定的 JSON 中也允许插值，这样模板化就不需要外部 shell 脚本了，那么操作将会非常快）

如果您对语法感到好奇jtc，可以在这里找到用户指南：https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md

Answer

事实上，即使没有 Python 编程，你也可以摆脱这里，只需使用 2 个 unix 实用程序：

jtm- 允许 xml <-> json 无损转换
jtc- 允许操作 JSON

因此，假设您的 xml 位于中file.xml，jtm 会将其转换为以下 json：

bash $ jtm file.xml 
[
   {
      "quiz": [
         {
            "que": "The question her"
         },
         {
            "ca": "text"
         },
         {
            "ia": "text"
         },
         {
            "ia": "text"
         },
         {
            "ia": "text"
         }
      ]
   }
]
bash $

然后，应用一系列 JSON 转换，您可以得到所需的结果：

bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo { '"answer[-]"': {} }\; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
   {
      "answer1": "text",
      "answer2": "text",
      "answer3": "text",
      "answer4": "text",
      "text": "The question her"
   }
]
bash $

不过，由于涉及 shell 脚本（echo命令），它会比 Python 慢 - 对于 5000 个问题，我预计它会运行大约一分钟。（在未来的版本中，jtc我计划甚至在静态指定的 JSON 中也允许插值，这样模板化就不需要外部 shell 脚本了，那么操作将会非常快）

如果您对语法感到好奇jtc，可以在这里找到用户指南：https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md

Question 2

该xq工具来自https://kislyuk.github.io/yq/将你的 XML 变成

{
  "quiz": {
    "que": "The question her",
    "ca": "text",
    "ia": [
      "text",
      "text",
      "text"
    ]
  }
}

只需使用恒等过滤器 ( xq . file.xml) 即可。

我们可以将其按摩成更接近您想要使用的形式

xq '.quiz | { text: .que, answers: .ia }' file.xml

哪个输出

{
  "text": "The question her",
  "answers": [
    "text",
    "text",
    "text"
  ]
}

要修复该answers位以便获得枚举键：

xq '.quiz |
    { text: .que } +
    (
        [
            range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
        ] | from_entries
    )' file.xml

这通过迭代节点并手动生成一组键和值来添加枚举answer键和来自节点的值。然后使用它们将它们转换为真正的键值对，并将其添加到我们创建的原始对象 ( ) 中。iaiafrom_entries{ text: .que }

输出：

{
  "text": "The question her",
  "answer1": "text",
  "answer2": "text",
  "answer3": "text"
}

如果您的 XML 文档quiz在某个根节点下包含多个节点，则将上面的表达式更改为.quiz对每个节点进行转换，并且您可能希望将结果对象放入数组中：jq.[].quiz[]

xq '.[].quiz[] |
    [ { text: .que } +
    (
        [
            range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
        ] | from_entries
    ) ]' file.xml

Answer

该xq工具来自https://kislyuk.github.io/yq/将你的 XML 变成

{
  "quiz": {
    "que": "The question her",
    "ca": "text",
    "ia": [
      "text",
      "text",
      "text"
    ]
  }
}

只需使用恒等过滤器 ( xq . file.xml) 即可。

我们可以将其按摩成更接近您想要使用的形式

xq '.quiz | { text: .que, answers: .ia }' file.xml

哪个输出

{
  "text": "The question her",
  "answers": [
    "text",
    "text",
    "text"
  ]
}

要修复该answers位以便获得枚举键：

xq '.quiz |
    { text: .que } +
    (
        [
            range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
        ] | from_entries
    )' file.xml

这通过迭代节点并手动生成一组键和值来添加枚举answer键和来自节点的值。然后使用它们将它们转换为真正的键值对，并将其添加到我们创建的原始对象 ( ) 中。iaiafrom_entries{ text: .que }

输出：

{
  "text": "The question her",
  "answer1": "text",
  "answer2": "text",
  "answer3": "text"
}

如果您的 XML 文档quiz在某个根节点下包含多个节点，则将上面的表达式更改为.quiz对每个节点进行转换，并且您可能希望将结果对象放入数组中：jq.[].quiz[]

xq '.[].quiz[] |
    [ { text: .que } +
    (
        [
            range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
        ] | from_entries
    ) ]' file.xml

Question 3

我假设你的 Ubuntu 已经安装了 python

#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree

d = """<quiz>
        <que>The question her</que>
        <ca>text</ca>
        <ia>text</ia>
        <ia>text</ia>
        <ia>text</ia>
    </quiz>
"""

s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out = {}
i = 1
for child in root:
    name, value = child.tag, child.text
    if name == 'que':
        name = 'question'
    else:
        name = 'answer%s' % i
        i += 1
    out[name] = value

print(json.dumps(out))

保存它并chmod保存为可执行文件，您可以轻松修改以将文件作为输入而不仅仅是文本

编辑好的，这是一个更完整的脚本：

#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree


def read_file(filename):
    root = xml.etree.ElementTree.parse(filename).getroot()
    return root


# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
    i = 1
    tmp = {}
    for child in quiz_element:

        name, value = child.tag, child.text
        if name == 'que':
            name = 'question'
        else:
            name = 'answer%s' % i
            i += 1
        tmp[name] = value
    out.append(tmp)


def parse_root(root_element, out):
    for child in root_element:
        if child.tag == 'quiz':
            parse_quiz(child, out)


def convert_xml_to_json(filename):
    root = read_file(filename)
    out = []
    parse_root(root, out)
    print(json.dumps(out))


if __name__ == '__main__':
    if len(sys.argv) > 1:
        convert_xml_to_json(sys.argv[1])
    else:
        print("Usage: script <filename_with_xml>")

我创建了一个包含以下内容的文件，我将其命名为xmltest：

<questions>
    <quiz>
        <que>The question her</que>
        <ca>text</ca>
        <ia>text</ia>
        <ia>text</ia>
        <ia>text</ia>
    </quiz>
     <quiz>
            <que>Question number 1</que>
            <ca>blabla</ca>
            <ia>stuff</ia>
    </quiz>
</questions>

quiz所以你有一个其他容器内部的列表。

现在，我像这样启动它： $ chmod u+x scratch.py，然后scratch.py filenamewithxml

这给了我答案：

$ ./scratch4.py xmltest
[{"answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text"}, {"answer2": "stuff", "question": "Question number 1", "answer1": "blabla"}]

Answer