我想选择两种模式之间的文本,如下所示。
这是输入:
Blalala
'Omfoem From
balanf PAT1 This is the
text that I want
to get PAT2: apples
Whatever: oranges
这是我想要的输出:
This is the
text that I want
to get
我尝试过使用这个脚本:
awk '/^.*PAT1/{flag=1; next} /.*PAT2/{flag=0} flag' file1.txt
但它只输出以下内容:
text that I want
我缺少与图案位于同一行的文本部分。
我正在使用 OSX。
答案1
GNU*awk
变体,只需创建PAT2
记录分隔符RS
、PAT1
字段分隔符FS
并打印最后一个字段NF
,确保输出不是重复的结果RS
awk 'BEGIN{RS="PAT2"; FS="PAT1"}NF>1{print $NF}' file1
This is the
text that I want
to get
This is another text that I want
to get DONE
*注意到@EdMorton
答案2
对于 GNU 来说sed
,尽管它很丑陋,但我认为可以解决这个问题:
sed -e 's/PAT1/\nPAT1\n/' -e 's/PAT2/\nPAT2\n/' file | sed -n '/PAT1/,/PAT2/{//!p}'
获取 PAT1 和 PAT2,并在开头和结尾添加换行符:
sed -e 's/PAT1/\nPAT1\n/' -e 's/PAT2/\nPAT2\n/'
Blalala
'Omfoem From
balanf
PAT1
This is the
text that I want
to get
PAT2
: apples
打印 PAT1 和 PAT2 之间的文本:
sed -n '/PAT1/,/PAT2/{//!p}'
This is the
text that I want
to get
答案3
在每个 UNIX 机器上的任何 shell 中使用任何 awk:
$ awk 'sub(/.*PAT1 */,""){f=1} f{if ( sub(/ *PAT2.*/,"") ) f=0; print}' file
This is the
text that I want
to get
上面的内容适用于您提供的示例输入,如果您有其他不同格式的输入,这不适用于(例如嵌套的开始/结束字符串或同一行的结束字符串后面的开始字符串),那么请编辑您的问题来表明这一点。
答案4
与GNUgrep(1)
grep -zoP "(?s)(?<=PAT1 )(.*)(?= PAT2)" file
测试
$ cat file
Blalala
'Omfoem From
balanf PAT1 This is the
text that I want
to get PAT2: apples
Whatever: oranges
$ grep -zoP "(?s)(?<=PAT1 )(.*)(?= PAT2)" file
This is the
text that I want
to get
从grep(1)
手册页
-z, --null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names. -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. -P, --perl-regexp Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and grep -P may warn of unimplemented features.
正则表达式解释:
(?s)
activate PCRE_DOTALL
,这意味着.
查找任何字符或换行符。
使用 Positive Lookbehind 断言(?<=PAT1 )
和 Positive Lookahead 断言(?= PAT2)
,导致仅打印捕获组(.*)
。
此解决方案的注意事项:
正如 @bushman 所说,只有当文件中仅存在这两种模式的一次时,这才有效。