转换/导入具有多条记录的平面文件，每行一个变量

Question

我不确定您到底想要什么作为最终结果，但这里有一个 Python 脚本，可以将您的数据转换为 XML：

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""transform.py

Parses a data file contain textual records in the following format:

    [RecordUUID.n]
    "Variable1Key"="Variable1Value"
    "Variable2Key"="Variable2Value"
    "Variable3Key"="Variable3Value"

and converts it to an XML document with record-elements of the following form:

    <RecordUUID.1>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable2Key>Variable2Value</Variable2Key>
        <Variable3Key>Variable3Value</Variable3Key>
    </RecordUUID.1>
"""

import sys
import re
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom.minidom import parseString

# Creat a root element for the XML document
root = Element('root')

# Set a variable to keep track of the current record
current_record = None

# Parse the data and construct an XML representation
with open(sys.argv[1]) as datafile:

    # Extract the non-empty lines from the data file
    lines = [line.strip() for line in datafile if line.strip()]

    # Iterate over the lines
    for line in lines:

        # Check to see if we've reached a new record
        if "RecordUUID" in line:

            # Extract the record ID
            eid = line.strip()[1:-1]

            # Add a new child element to the document
            # and update the current record
            current_record = SubElement(root, eid)

        # Check to see if we've reached a new key-value pair
        else:
            match = re.match(r'^"(\w+)"="(\w+)"$', line.strip())

            # If we have a key-value pair then update the current record
            if match:
                key, value = match.groups()
                SubElement(current_record, key).text = value

# Display the generated XML document
print(parseString(tostring(root)).toprettyxml(indent="\t"))

如果我们将以下数据（即您问题中的示例数据）放入名为的文件中data.txt：

[RecordUUID.1]
"Variable1Key"="Variable1Value"
"Variable2Key"="Variable2Value"
"Variable3Key"="Variable3Value"

[RecordUUID.4]
"Variable1Key"="Variable1Value"
"Variable5Key1"="Variable51Value1"
"Variable5Key1"="Variable51Value2"
"Variable5Key2"="Variable52Value1"
"Variable5Key2"="Variable52Value2"

然后运行脚本：

user@host:~$ python transform.py data.txt

然后我们得到以下输出：

<?xml version="1.0" ?>
<root>
    <RecordUUID.1>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable2Key>Variable2Value</Variable2Key>
        <Variable3Key>Variable3Value</Variable3Key>
    </RecordUUID.1>
    <RecordUUID.4>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable5Key1>Variable51Value1</Variable5Key1>
        <Variable5Key1>Variable51Value2</Variable5Key1>
        <Variable5Key2>Variable52Value1</Variable5Key2>
        <Variable5Key2>Variable52Value2</Variable5Key2>
    </RecordUUID.4>
</root>

Answer 1

我不确定您到底想要什么作为最终结果，但这里有一个 Python 脚本，可以将您的数据转换为 XML：

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""transform.py

Parses a data file contain textual records in the following format:

    [RecordUUID.n]
    "Variable1Key"="Variable1Value"
    "Variable2Key"="Variable2Value"
    "Variable3Key"="Variable3Value"

and converts it to an XML document with record-elements of the following form:

    <RecordUUID.1>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable2Key>Variable2Value</Variable2Key>
        <Variable3Key>Variable3Value</Variable3Key>
    </RecordUUID.1>
"""

import sys
import re
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom.minidom import parseString

# Creat a root element for the XML document
root = Element('root')

# Set a variable to keep track of the current record
current_record = None

# Parse the data and construct an XML representation
with open(sys.argv[1]) as datafile:

    # Extract the non-empty lines from the data file
    lines = [line.strip() for line in datafile if line.strip()]

    # Iterate over the lines
    for line in lines:

        # Check to see if we've reached a new record
        if "RecordUUID" in line:

            # Extract the record ID
            eid = line.strip()[1:-1]

            # Add a new child element to the document
            # and update the current record
            current_record = SubElement(root, eid)

        # Check to see if we've reached a new key-value pair
        else:
            match = re.match(r'^"(\w+)"="(\w+)"$', line.strip())

            # If we have a key-value pair then update the current record
            if match:
                key, value = match.groups()
                SubElement(current_record, key).text = value

# Display the generated XML document
print(parseString(tostring(root)).toprettyxml(indent="\t"))

如果我们将以下数据（即您问题中的示例数据）放入名为的文件中data.txt：

[RecordUUID.1]
"Variable1Key"="Variable1Value"
"Variable2Key"="Variable2Value"
"Variable3Key"="Variable3Value"

[RecordUUID.4]
"Variable1Key"="Variable1Value"
"Variable5Key1"="Variable51Value1"
"Variable5Key1"="Variable51Value2"
"Variable5Key2"="Variable52Value1"
"Variable5Key2"="Variable52Value2"

然后运行脚本：

user@host:~$ python transform.py data.txt

然后我们得到以下输出：

<?xml version="1.0" ?>
<root>
    <RecordUUID.1>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable2Key>Variable2Value</Variable2Key>
        <Variable3Key>Variable3Value</Variable3Key>
    </RecordUUID.1>
    <RecordUUID.4>
        <Variable1Key>Variable1Value</Variable1Key>
        <Variable5Key1>Variable51Value1</Variable5Key1>
        <Variable5Key1>Variable51Value2</Variable5Key1>
        <Variable5Key2>Variable52Value1</Variable5Key2>
        <Variable5Key2>Variable52Value2</Variable5Key2>
    </RecordUUID.4>
</root>

转换/导入具有多条记录的平面文件，每行一个变量

答案1

相关内容