我在 txt 文件中有 5000 个问题,如下所示:
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
我想在 Ubuntu 中编写一个脚本来转换所有问题,如下所示:
{
"text":"The question her",
"answer1":"text",
"answer2":"text",
"answer3":"text",
"answer4":"text"
},
答案1
事实上,即使没有 Python 编程,你也可以摆脱这里,只需使用 2 个 unix 实用程序:
因此,假设您的 xml 位于 中file.xml
,jtm 会将其转换为以下 json:
bash $ jtm file.xml
[
{
"quiz": [
{
"que": "The question her"
},
{
"ca": "text"
},
{
"ia": "text"
},
{
"ia": "text"
},
{
"ia": "text"
}
]
}
]
bash $
然后,应用一系列 JSON 转换,您可以得到所需的结果:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo { '"answer[-]"': {} }\; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
{
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
}
]
bash $
不过,由于涉及 shell 脚本(echo
命令),它会比 Python 慢 - 对于 5000 个问题,我预计它会运行大约一分钟。 (在未来的版本中,jtc
我计划甚至在静态指定的 JSON 中也允许插值,这样模板化就不需要外部 shell 脚本了,那么操作将会非常快)
如果您对语法感到好奇jtc
,可以在这里找到用户指南:https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
答案2
该xq
工具来自https://kislyuk.github.io/yq/将你的 XML 变成
{
"quiz": {
"que": "The question her",
"ca": "text",
"ia": [
"text",
"text",
"text"
]
}
}
只需使用恒等过滤器 ( xq . file.xml
) 即可。
我们可以将其按摩成更接近您想要使用的形式
xq '.quiz | { text: .que, answers: .ia }' file.xml
哪个输出
{
"text": "The question her",
"answers": [
"text",
"text",
"text"
]
}
要修复该answers
位以便获得枚举键:
xq '.quiz |
{ text: .que } +
(
[
range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
] | from_entries
)' file.xml
这通过迭代节点并手动生成一组键和值来添加枚举answer
键和来自节点的值。然后使用它们将它们转换为真正的键值对,并将其添加到我们创建的原始对象 ( ) 中。ia
ia
from_entries
{ text: .que }
输出:
{
"text": "The question her",
"answer1": "text",
"answer2": "text",
"answer3": "text"
}
如果您的 XML 文档quiz
在某个根节点下包含多个节点,则将上面的表达式更改为.quiz
对每个节点进行转换,并且您可能希望将结果对象放入数组中:jq
.[].quiz[]
xq '.[].quiz[] |
[ { text: .que } +
(
[
range(.ia|length) as $i | { key: "answer\($i+1)", value: .ia[$i] }
] | from_entries
) ]' file.xml
答案3
我假设你的 Ubuntu 已经安装了 python
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out = {}
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
保存它并chmod
保存为可执行文件,您可以轻松修改以将文件作为输入而不仅仅是文本
编辑 好的,这是一个更完整的脚本:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp = {}
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out = []
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
我创建了一个包含以下内容的文件,我将其命名为xmltest
:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
quiz
所以你有一个其他容器内部的列表。
现在,我像这样启动它:
$ chmod u+x scratch.py
,然后scratch.py filenamewithxml
这给了我答案:
$ ./scratch4.py xmltest
[{"answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text"}, {"answer2": "stuff", "question": "Question number 1", "answer1": "blabla"}]
答案4
谢谢dgan,但是你的代码:1-在屏幕上打印输出而不是在json文件中,并且不支持encoding = utf-8,所以我更改它:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp = {}
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out = []
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`