我正在尝试从文件中提取“段落”。段落之间没有空行,但 > < 之间有行没有文本。我的方法是将 RS 记录分隔符指定为没有模式 >*< 的行
file.txt 是:
<text:p text:style-name="P1"/>
<text:p text:style-name="P11"/>
<text:p text:style-name="P10">1</text:p>
<text:p text:style-name="P10">2</text:p>
<text:p text:style-name="P10">3 this is line that matches</text:p>>
<text:p text:style-name="P10">4</text:p>
<text:p text:style-name="P10">5</text:p>
<text:p text:style-name="P1"/>
我对代码的尝试是
$ awk '/matches/' RS=^">"*"<" file.txt
期望输出是:
<text:p text:style-name="P10">1</text:p>
<text:p text:style-name="P10">2</text:p>
<text:p text:style-name="P10">3 this is line that matches</text:p>>
<text:p text:style-name="P10">4</text:p>
<text:p text:style-name="P10">5</text:p>
但输出的是整个文件。我做错了什么?
编辑:
如果 file.xml 是
<long line of alphanumerics, slashes, single and double quotes><more or the same><and many more>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0" text:name="Table"/>
<text:sequence-decl text:display-outline-level="0" text:name="Text"/>
<text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
<text:sequence-decl text:display-outline-level="0" text:name="Figure"/>
</text:sequence-decls>
<text:p text:style-name="P1">This is the first line</text:p>
<text:p text:style-name="P1"/>
<text:p text:style-name="P1">This is the third line</text:p>
<text:p text:style-name="P1">and this is some more text that is to be included</text:p>
<text:p text:style-name="P1"/>
<text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
<text:p text:style-name="P1">with this line</text:p>
<text:p text:style-name="P1">and this one</text:p>
</office:text>
并使用
$ awk '/line/' RS='\n[ \t]*<[^>]*>\n' file.xml
输出了整个文件,而我正在寻找:
<text:p text:style-name="P1">This is the first line</text:p>
<text:p text:style-name="P1">This is the third line</text:p>
<text:p text:style-name="P1">and this is some more text that is to be included</text:p>
<text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
<text:p text:style-name="P1">with this line</text:p>
<text:p text:style-name="P1">and this one</text:p>