AWK 与 RS 不匹配模式

AWK 与 RS 不匹配模式

我正在尝试从文件中提取“段落”。段落之间没有空行,但 > < 之间有行没有文本。我的方法是将 RS 记录分隔符指定为没有模式 >*< 的行

file.txt 是:

  <text:p text:style-name="P1"/>
  <text:p text:style-name="P11"/>
  <text:p text:style-name="P10">1</text:p>
  <text:p text:style-name="P10">2</text:p>
  <text:p text:style-name="P10">3 this is line that matches</text:p>>
  <text:p text:style-name="P10">4</text:p>
  <text:p text:style-name="P10">5</text:p>
  <text:p text:style-name="P1"/>

我对代码的尝试是

$ awk '/matches/' RS=^">"*"<" file.txt

期望输出是:

  <text:p text:style-name="P10">1</text:p>
  <text:p text:style-name="P10">2</text:p>
  <text:p text:style-name="P10">3 this is line that matches</text:p>>
  <text:p text:style-name="P10">4</text:p>
  <text:p text:style-name="P10">5</text:p>

但输出的是整个文件。我做错了什么?


编辑:

如果 file.xml 是

<long line of alphanumerics, slashes, single and double quotes><more or the same><and many more>
      <office:text>
      <text:sequence-decls>
        <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Table"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Text"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/>
        <text:sequence-decl text:display-outline-level="0" text:name="Figure"/>
      </text:sequence-decls>
      <text:p text:style-name="P1">This is the first line</text:p>
      <text:p text:style-name="P1"/>
      <text:p text:style-name="P1">This is the third line</text:p>
      <text:p text:style-name="P1">and this is some more text that is to be included</text:p>
      <text:p text:style-name="P1"/>
      <text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
      <text:p text:style-name="P1">with this line</text:p>
      <text:p text:style-name="P1">and this one</text:p>
    </office:text>

并使用

$ awk '/line/' RS='\n[ \t]*<[^>]*>\n' file.xml

输出了整个文件,而我正在寻找:

      <text:p text:style-name="P1">This is the first line</text:p>
      <text:p text:style-name="P1">This is the third line</text:p>
      <text:p text:style-name="P1">and this is some more text that is to be included</text:p>
      <text:p text:style-name="P1">This is the sixth. I want it included,</text:p>
      <text:p text:style-name="P1">with this line</text:p>
      <text:p text:style-name="P1">and this one</text:p>

相关内容