我有几千个不同格式的 xml 文件,我想将它们转换为 CSV。我注意到,在 Excel 中,您可以将 XML 文件“作为 XML 表”打开,它会根据 XML 源数据创建一个架构。有没有办法在 Powershell 中使用此 Excel 功能,以便我可以自动转换这些文件?
到目前为止,我所有的研究都结束于需要了解解析文件的模式的地方,这对于我正在处理的文件量来说没有帮助。
以下是一个示例:
<?xml version='1.0' encoding='UTF-8'?>
<results exportTime="2016-02-11 21:44:04 CST">
<report label="S Report 5x" slug="L211">
<record reporting_date_end="03/28/2014" narrative="null">
<report label="Current Volume">
<record central_volume_loads="166.79" zone1_volume="233.26" zone3_volume="17.52" zone2_volume="48.94" zone1_volume_pounds="6,671,777" zone4_volume_pounds="9,330,260" zone2_volume_pounds="700,694" zone3_volume_pounds="1,957,789"/>
</report>
<report label="Zone1">
<record imps_percentage="92-94%" total_units="794,007" total_loads="19.85" weighted_average="274.76"/>
<record imps_percentage="90%" total_units="1,091,657" total_loads="27.29" weighted_average="264.35"/>
<record imps_percentage="85%" total_units="3,467,452" total_loads="86.69" weighted_average="228.20"/>
<record imps_percentage="81%" total_units="12,501" total_loads=".31" weighted_average="228.24"/>
<record imps_percentage="75%" total_units="195,690" total_loads="4.89" weighted_average="187.30"/>
<record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/>
<record imps_percentage="65%" total_units="1,110,470" total_loads="27.76" weighted_average="166.04"/>
</report>
<report label="Zone2">
<record imps_percentage="92-94%" total_units="891,007" total_loads="22.28" weighted_average="275.26"/>
<record imps_percentage="90%" total_units="1,813,306" total_loads="45.33" weighted_average="265.04"/>
<record imps_percentage="85%" total_units="4,378,347" total_loads="109.46" weighted_average="231.04"/>
<record imps_percentage="81%" total_units="256,289" total_loads="6.41" weighted_average="224.46"/>
<record imps_percentage="75%" total_units="234,200" total_loads="5.86" weighted_average="183.03"/>
<record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/>
<record imps_percentage="65%" total_units="1,757,111" total_loads="43.93" weighted_average="165.35"/>
</report>
</record>
<record reporting_date_end="03/21/2014" narrative="null">
<report label="Current Volume">
<record central_volume_loads="144.13" zone1_volume="215.06" zone3_volume="27.46" zone2_volume="43.47" zone1_volume_pounds="5,765,336" zone4_volume_pounds="8,602,512" zone2_volume_pounds="1,098,540" zone3_volume_pounds="1,738,636"/>
</report>
<report label="Zone1">
<record imps_percentage="92-94%" total_units="857,233" total_loads="21.43" weighted_average="268.96"/>
<record imps_percentage="90%" total_units="826,451" total_loads="20.66" weighted_average="260.82"/>
<record imps_percentage="85%" total_units="3,104,954" total_loads="77.62" weighted_average="223.99"/>
<record imps_percentage="81%" total_units="8,145" total_loads=".20" weighted_average="248.76"/>
<record imps_percentage="75%" total_units="136,304" total_loads="3.41" weighted_average="183.96"/>
<record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/>
<record imps_percentage="65%" total_units="832,249" total_loads="20.81" weighted_average="164.72"/>
</report>
<report label="Zone2">
<record imps_percentage="92-94%" total_units="995,233" total_loads="24.88" weighted_average="270.50"/>
<record imps_percentage="90%" total_units="1,696,678" total_loads="42.42" weighted_average="262.66"/>
<record imps_percentage="85%" total_units="3,929,416" total_loads="98.24" weighted_average="227.55"/>
<record imps_percentage="81%" total_units="269,549" total_loads="6.74" weighted_average="220.32"/>
<record imps_percentage="75%" total_units="181,125" total_loads="4.53" weighted_average="186.37"/>
<record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/>
<record imps_percentage="65%" total_units="1,530,511" total_loads="38.26" weighted_average="165.33"/>
</report>
</record>
<record reporting_date_end="03/14/2014" narrative="null">
<report label="Current Volume">
<record central_volume_loads="126.80" zone1_volume="186.95" zone3_volume="25.49" zone2_volume="34.67" zone1_volume_pounds="5,071,823" zone4_volume_pounds="7,478,136" zone2_volume_pounds="1,019,473" zone3_volume_pounds="1,386,840"/>
</report>
<report label="Zone1">
<record imps_percentage="92-94%" total_units="738,249" total_loads="18.46" weighted_average="263.39"/>
<record imps_percentage="90%" total_units="783,791" total_loads="19.59" weighted_average="254.45"/>
<record imps_percentage="85%" total_units="2,586,286" total_loads="64.66" weighted_average="220.72"/>
<record imps_percentage="81%" total_units="49,348" total_loads="1.23" weighted_average="191.71"/>
<record imps_percentage="75%" total_units="131,844" total_loads="3.30" weighted_average="182.33"/>
<record imps_percentage="73%" total_units="0" total_loads=".00" weighted_average=".00"/>
<record imps_percentage="65%" total_units="782,305" total_loads="19.56" weighted_average="150.54"/>
</report>
</record>
</report>
</results>
Excel 的结果如下:
exportTime label slug reporting_date_end narrative label2 central_volume_loads zone1_volume zone3_volume zone2_volume zone1_volume_pounds zone4_volume_pounds zone2_volume_pounds zone3_volume_pounds imps_percentage total_units total_loads weighted_average
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Current Volume 166.79 233.26 17.52 48.94 6,671,777 9,330,260 700,694 1,957,789
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 92-94% 794,007 19.85 274.76
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 90% 1,091,657 27.29 264.35
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 85% 3,467,452 86.69 228.20
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 81% 12,501 .31 228.24
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 75% 195,690 4.89 187.30
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone1 65% 1,110,470 27.76 166.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 92-94% 891,007 22.28 275.26
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 90% 1,813,306 45.33 265.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 85% 4,378,347 109.46 231.04
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 81% 256,289 6.41 224.46
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 75% 234,200 5.86 183.03
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/28/2014 null Zone2 65% 1,757,111 43.93 165.35
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Current Volume 144.13 215.06 27.46 43.47 5,765,336 8,602,512 1,098,540 1,738,636
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 92-94% 857,233 21.43 268.96
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 90% 826,451 20.66 260.82
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 85% 3,104,954 77.62 223.99
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 81% 8,145 .20 248.76
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 75% 136,304 3.41 183.96
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone1 65% 832,249 20.81 164.72
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 92-94% 995,233 24.88 270.50
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 90% 1,696,678 42.42 262.66
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 85% 3,929,416 98.24 227.55
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 81% 269,549 6.74 220.32
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 75% 181,125 4.53 186.37
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/21/2014 null Zone2 65% 1,530,511 38.26 165.33
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Current Volume 126.8 186.95 25.49 34.67 5,071,823 7,478,136 1,019,473 1,386,840
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 92-94% 738,249 18.46 263.39
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 90% 783,791 19.59 254.45
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 85% 2,586,286 64.66 220.72
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 81% 49,348 1.23 191.71
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 75% 131,844 3.30 182.33
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 73% 0 .00 .00
2016-02-11 21:44:04 CST S Report 5x L211 03/14/2014 null Zone1 65% 782,305 19.56 150.54
答案1
首先,您的 XML 是错误的,因为它包含嵌套在自身中的元素。这涉及report
和record
项目。我已在中发布了该文件的修复版本粘贴箱,其中我已将外部report
和record
项目重命名为xreport
和xrecord
。
其次,由于它包含嵌套元素,因此不能简单地将它们转换为 CSV,因为一个 CSV 文件只能包含一种类型的元素。
由于嵌套问题,PowerShell 无法对此类文件使用,因为它只能进行简单的转换。请参阅文章酷炫的 powershell:两行代码就能将 Xml 转换为 Csv 举个例子。
可以手动编码 PowerShell 来转换已知结构的 XML 文件。例如,请参阅以下文章:
可以处理这种复杂性的工具是免费的 Xml 到 Csv 转换工具。
为了使其正常工作,您需要下载并解压以下内容:
- 交互式工具版本 16
- Xml 到 Csv 命令行版本 1.5
一旦解压到一个文件夹,您就可以使用交互式程序在一个或两个文件上测试其功能。
如果它能满足您的需要,您可以使用控制台变体批量转换文件。批处理命令的格式为:
XmlToCsv.Console.exe -xml <source.xml> -dir <destination-folder>
批处理文件需要将具有相似结构的文件分别转换为各自的文件夹,因为它会将上述示例文件转换为五个文件,分别名为 record.csv、report.csv、results.csv、xrecord.csv、xreport.csv。由于这些名称对于结构相似的 XML 文件重复,因此如果将生成的文件全部定向到同一文件夹,它们将相互覆盖。
如果该程序不适合您的需求,您将需要编写一个 Powershell 脚本,该脚本应包含两个阶段:
- 第 1 阶段:通过深入到最低级别来创建 CSV 标头
- 第 2 阶段:深度搜索整个 XML 以生成 CSV 数据行。
您的脚本可以循环遍历文件,对每个文件执行以下操作:
[xml] $xml = [xml] (Get-Content -Path \path\to\file.xml)
[System.Xml.XmlElement] $root = $xml.get_DocumentElement()
$root 包含<results>
具有以下属性的节点:
请注意,您正在寻找具有 Property 的 MemberType 的项目。具有字符串定义的项目是您需要收集的项目,而 System.Object 需要再次钻取:
然后再次 :
这次我们需要深入研究具有 System.Xml.XmlElement 定义的项目:
现在您拥有所有属性并可以构建标题行。
脚本本身需要使用福拉奇命令循环遍历对象的属性并检查MemberType和Definition的属性。
我没有时间编写完整的脚本,但希望这可以帮助您入门。