将特殊格式的 .txt 文件转换为 XML

Question

这可以作为 Python 中此类转换器的基础：

#!/usr/bin/python

import fileinput
import re

entryre = re.compile( "^%%% <([^>]+)>")
seealsore = re.compile( "see also > <([^>]+)>")

def pnode(nodename, nodeblock):
    print "<" + nodename + ">"
    print nodeblock
    print "</" + nodename + ">"


block = ""
entry = ""
for line in fileinput.input():
    match = re.match( entryre, line)
    if match:
        if entry:
            pnode(entry, block)
            block = ""
            entry = ""
        entry = match.group(1)
    else:
        match = re.match( seealsore, line)
        line = re.sub( seealsore, r'<seealso>\1</seealso>', line)
        block += line

pnode(entry, block)

它读取任何文件（建议使用 stdin）并写入 stdout。只需将输出包装在 xml 头和尾之间即可。如果条目包含空格，则应小心处理。此外，如果块包含类似标签的子字符串（“”），则需要进行额外的转换。

但是，如果您只想浏览条目，我建议使用简单的 html。定义列表或表格都可以很好地满足您的需求。转换代码只需要进行少量更改。

Answer 1

这可以作为 Python 中此类转换器的基础：

#!/usr/bin/python

import fileinput
import re

entryre = re.compile( "^%%% <([^>]+)>")
seealsore = re.compile( "see also > <([^>]+)>")

def pnode(nodename, nodeblock):
    print "<" + nodename + ">"
    print nodeblock
    print "</" + nodename + ">"


block = ""
entry = ""
for line in fileinput.input():
    match = re.match( entryre, line)
    if match:
        if entry:
            pnode(entry, block)
            block = ""
            entry = ""
        entry = match.group(1)
    else:
        match = re.match( seealsore, line)
        line = re.sub( seealsore, r'<seealso>\1</seealso>', line)
        block += line

pnode(entry, block)

它读取任何文件（建议使用 stdin）并写入 stdout。只需将输出包装在 xml 头和尾之间即可。如果条目包含空格，则应小心处理。此外，如果块包含类似标签的子字符串（“”），则需要进行额外的转换。

但是，如果您只想浏览条目，我建议使用简单的 html。定义列表或表格都可以很好地满足您的需求。转换代码只需要进行少量更改。

将特殊格式的 .txt 文件转换为 XML

答案1

相关内容