从 XML 结果集中的标签中提取数据

从 XML 结果集中的标签中提取数据

我需要为该结果集中所有出现的事件获取 2 个标签“estimated”和“fullSign”的数据。

RESULT SET:

<?xml version="1.0" encoding="UTF-8"?>
<resultSet xmlns="urn:trimet:arrivals" queryTime="1469138325745"><location desc="Morrison/SW 3rd Ave MAX Station" dir="Westbound" lat="45.5181811277907" lng="-122.675385866199" locid="8381" /><arrival block="9007" departed="true" dir="1" status="estimated" estimated="1469138452000" fullSign="MAX  Blue Line to Hillsboro" piece="1" route="100" scheduled="1469138250000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="1901" at="1469138300978" heading="201" lat="45.5214364" lng="-122.6716177"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557314" destDist="77046" pattern="54" progress="75145" /></blockPosition></arrival><arrival block="9050" departed="true" dir="1" status="estimated" estimated="1469138664000" fullSign="MAX  Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469138670000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="4552" at="1469138313683" heading="237" lat="45.5277621" lng="-122.6687878"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556307" destDist="66321" pattern="15" progress="61769" /></blockPosition></arrival><arrival block="9018" departed="true" dir="1" status="estimated" estimated="1469139140000" fullSign="MAX  Blue Line to Hillsboro" piece="1" route="100" scheduled="1469139150000" shortSign="Blue to Hillsboro" locid="8381" detour="false"><blockPosition feet="13687" at="1469138320005" heading="239" lat="45.5309688" lng="-122.6350333"><trip desc="Hatfield Government Center" dir="1" route="100" tripNum="6557315" destDist="77046" pattern="54" progress="63359" /></blockPosition></arrival><arrival block="9043" departed="true" dir="1" status="estimated" estimated="1469139577000" fullSign="MAX  Red Line to City Center &amp; Beaverton" piece="1" route="90" scheduled="1469139570000" shortSign="Red Line to Beaverton" locid="8381" detour="false"><blockPosition feet="31909" at="1469138310486" heading="285" lat="45.5320383" lng="-122.5738342"><trip desc="Beaverton TC Pocket" dir="1" route="90" tripNum="6556308" destDist="66321" pattern="15" progress="34412" /></blockPosition></arrival></resultSet>

预期结果:

1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line    to City Center &amp; Beaverton 
1469139140000 MAX  Blue Line  to    Hillsboro 
1469139577000 MAX  Red Line to City Center &amp;Beaverton

我提取这些数据的好方法是什么?

答案1

这是使用 XMLstarlet 和paste.它可能可以通过对 XMLstarlet 的一次调用来完成,但我不是向导:

$ paste <(xml sel -T -t -v '//@estimated' data.xml) \
        <(xml sel -T -t -v '//@fullSign' data.xml)
1469138452000   MAX Blue Line to Hillsboro
1469138664000   MAX Red Line to City Center & Beaverton
1469139140000   MAX Blue Line to Hillsboro
1469139577000   MAX Red Line to City Center & Beaverton

答案2

$ xml2 < sunnx.xml | awk -F= '
   $1 ~ /@fullSign/  { fs=$2 ; sub(/&/,"&amp;",fs) };
   $1 ~ /@estimated/ { est=$2 };
   fs && est         { printf "%s %s\n", est, fs; fs=est="" }'
1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line to City Center &amp; Beaverton
1469139140000 MAX  Blue Line to Hillsboro
1469139577000 MAX  Red Line to City Center &amp; Beaverton

如果您想要一个文字&而不是&amp;,那么就去掉sub()函数调用。 xml2为您解码编码实体,因此我添加了sub()将其更改回符合您请求的输出。

如果没有sub(),输出如下所示:

1469138452000 MAX  Blue Line to Hillsboro
1469138664000 MAX  Red Line to City Center & Beaverton
1469139140000 MAX  Blue Line to Hillsboro
1469139577000 MAX  Red Line to City Center & Beaverton

相关内容