我有一个非常大的树状 XML 文件,大约 1 GB。
我需要删除不包含值的行<Sample> ... </Sample>
(包括其中的子行)。<Sample> ... </Sample>
<segmentation><![CDATA[0.11]]></segmentation>
例如,带有标签的行如下:
<segmentation><![CDATA[0.11]]></segmentation>
<segmentation><![CDATA[0.25]]></segmentation>
<segmentation><![CDATA[0.61]]></segmentation>
在下面的例子中,是否可以通过仅保留带有标签的子行来删除所有<Sample>
行和子行?<Sample>
<segmentation><![CDATA[0.11]]></segmentation>
最初的:
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.11]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.25]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.61]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
结果:
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.11]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>