

我正在寻找一种使用(UNIX)命令行工具将平面文件读入(纽约)数据库的有效方法,也许是在首先转换为中间结构化格式(例如 XML 或 csv)之后。该平面文件具有多个记录,每行包含一个键和值,如下所示。变量的数量可能因记录而异,并且只有在读取输入文件后才知道变量的数量和名称。 (更复杂的是,可能存在一些重复变量的嵌套,但这可以暂时忽略。)



我已经检查了本网站上有关将行转换为列的答案以及其他答案,但似乎都不合适。这个问题似乎与读取 INI 或 VCARD 文件类似,但我找不到通用的解决方案; XSL 转换可能是可行的,但尚未找到。任何指示都非常受欢迎,谢谢。


我不确定您到底想要什么作为最终结果,但这里有一个 Python 脚本,可以将您的数据转换为 XML:

#!/usr/bin/env python2
# -*- coding: ascii -*-

Parses a data file contain textual records in the following format:


and converts it to an XML document with record-elements of the following form:


import sys
import re
from xml.etree.ElementTree import Element, SubElement, tostring
from xml.dom.minidom import parseString

# Creat a root element for the XML document
root = Element('root')

# Set a variable to keep track of the current record
current_record = None

# Parse the data and construct an XML representation
with open(sys.argv[1]) as datafile:

    # Extract the non-empty lines from the data file
    lines = [line.strip() for line in datafile if line.strip()]

    # Iterate over the lines
    for line in lines:

        # Check to see if we've reached a new record
        if "RecordUUID" in line:

            # Extract the record ID
            eid = line.strip()[1:-1]

            # Add a new child element to the document
            # and update the current record
            current_record = SubElement(root, eid)

        # Check to see if we've reached a new key-value pair
            match = re.match(r'^"(\w+)"="(\w+)"$', line.strip())

            # If we have a key-value pair then update the current record
            if match:
                key, value = match.groups()
                SubElement(current_record, key).text = value

# Display the generated XML document





user@host:~$ python transform.py data.txt


<?xml version="1.0" ?>
