使用 awk 或 sed 选择两个模式之间的文本

使用 awk 或 sed 选择两个模式之间的文本

我想选择两种模式之间的文本,如下所示。

这是输入:

Blalala
'Omfoem From 
balanf PAT1 This is the
text that I want
to get PAT2: apples
Whatever: oranges

这是我想要的输出:

This is the
text that I want
to get

我尝试过使用这个脚本:

awk '/^.*PAT1/{flag=1; next} /.*PAT2/{flag=0} flag' file1.txt

但它只输出以下内容:

text that I want

我缺少与图案位于同一行的文本部分。

我正在使用 OSX。

答案1

GNU*awk变体,只需创建PAT2记录分隔符RSPAT1字段分隔符FS并打印最后一个字段NF,确保输出不是重复的结果RS

awk 'BEGIN{RS="PAT2"; FS="PAT1"}NF>1{print $NF}' file1
 This is the
text that I want
to get 

 This is another text that I want
to get DONE

*注意到@EdMorton

答案2

对于 GNU 来说sed,尽管它很丑陋,但我认为可以解决这个问题:

sed -e 's/PAT1/\nPAT1\n/' -e 's/PAT2/\nPAT2\n/' file | sed -n '/PAT1/,/PAT2/{//!p}'

获取 PAT1 和 PAT2,并在开头和结尾添加换行符:

sed -e 's/PAT1/\nPAT1\n/' -e 's/PAT2/\nPAT2\n/'

Blalala
'Omfoem From 
balanf 
PAT1
 This is the
text that I want
to get 
PAT2
: apples

打印 PAT1 和 PAT2 之间的文本:

sed -n '/PAT1/,/PAT2/{//!p}'

 This is the
text that I want
to get 

答案3

在每个 UNIX 机器上的任何 shell 中使用任何 awk:

$ awk 'sub(/.*PAT1 */,""){f=1} f{if ( sub(/ *PAT2.*/,"") ) f=0; print}' file
This is the
text that I want
to get

上面的内容适用于您提供的示例输入,如果您有其他不同格式的输入,这不适用于(例如嵌套的开始/结束字符串或同一行的结束字符串后面的开始字符串),那么请编辑您的问题来表明这一点。

答案4

与GNUgrep(1)

grep -zoP "(?s)(?<=PAT1 )(.*)(?= PAT2)" file

测试

$ cat file
Blalala
'Omfoem From
balanf PAT1 This is the
text that I want
to get PAT2: apples
Whatever: oranges

$ grep -zoP "(?s)(?<=PAT1 )(.*)(?= PAT2)" file
This is the
text that I want
to get

grep(1)手册页

-z, --null-data
  Treat the input as a set of lines, each terminated by  a  zero  byte  (the  ASCII NUL  
  character) instead  of  a  newline.  Like the -Z or --null option, this option can be 
  used with commands like sort -z to process arbitrary file names.

-o, --only-matching
   Print  only  the  matched  (non-empty) parts of a matching line, with each such part 
   on a separate output line.

-P, --perl-regexp
   Interpret PATTERN as a Perl regular expression (PCRE, see below).  This is highly 
   experimental and grep -P may warn of unimplemented features.

正则表达式解释:

(?s)activate PCRE_DOTALL,这意味着.查找任何字符或换行符。

使用 Positive Lookbehind 断言(?<=PAT1 )和 Positive Lookahead 断言(?= PAT2),导致仅打印捕获组(.*)

此解决方案的注意事项:

正如 @bushman 所说,只有当文件中仅存在这两种模式的一次时,这才有效。

相关内容