是否从模式 1 中删除文本，直到并包括模式 2 的第二个匹配项？

Question 1

这恰恰相反

和sed你一起做类似的事情：

sed -n '/PATTERN1/,$!{         # if not in this range
p;d                            # print and delete
}
/PATTERN2/!d                   # delete if it doesn't match PATTERN2
x;//!d                         # exchange and then, again, delete if no match
: do                           # label "do" (executed only after the 2nd match)
n;p                            # get the next line and print
b do' infile                   # go to label "do"

或者，在一行中（在gnu设置上）：

sed -n '/PATTERN1/,$!{p;d;};/PATTERN2/!d;x;//!d;: do;n;p;b do' infile

当然，使用和计数器更容易awk。我会把它留给你作为练习......

Answer

这恰恰相反

如何打印模式1和模式2的第二个匹配之间的行？

和sed你一起做类似的事情：

sed -n '/PATTERN1/,$!{         # if not in this range
p;d                            # print and delete
}
/PATTERN2/!d                   # delete if it doesn't match PATTERN2
x;//!d                         # exchange and then, again, delete if no match
: do                           # label "do" (executed only after the 2nd match)
n;p                            # get the next line and print
b do' infile                   # go to label "do"

或者，在一行中（在gnu设置上）：

sed -n '/PATTERN1/,$!{p;d;};/PATTERN2/!d;x;//!d;: do;n;p;b do' infile

当然，使用和计数器更容易awk。我会把它留给你作为练习......

Question 2

简单明了awk：

$ awk '/<!--START OF FILE -->/ {a=2}; !a; /x x x x x x x/ && a {a--}' < data

I need everything
from this point
...

它只是在a0 时打印，并在看到时递减x x x ...。

或者从文件的实际开头而不是模式开始，将第一个块更改为BEGIN {a=2}.

请注意，您的示例输入在第二个之后有一个空行x x x...，如果我们停止删除该行中的行，它仍保留在输出中x x x...。

Answer

简单明了awk：

$ awk '/<!--START OF FILE -->/ {a=2}; !a; /x x x x x x x/ && a {a--}' < data

I need everything
from this point
...

它只是在a0 时打印，并在看到时递减x x x ...。

或者从文件的实际开头而不是模式开始，将第一个块更改为BEGIN {a=2}.

请注意，您的示例输入在第二个之后有一个空行x x x...，如果我们停止删除该行中的行，它仍保留在输出中x x x...。

Question 3

grep -Pz '(?s)<!--START OF FILE(.*?x x x x x x x){2}\K.*' input.txt

解释

grep -Pz
- -P- 将模式解释为 Perl 兼容的正则表达式 (PCRE)。
- -z- 将其input.txt作为一条大线进行处理。
(?s)<!--START OF FILE(.*?x x x x x x x){2}\K.*
- (?s)- 为正则表达式的其余部分打开“点匹配换行符”。
- .*?- 非贪婪匹配。
- {2}- 模式的重复次数。
- \K- 从最终匹配的字符串中省略任何先前匹配的字符。

Answer

grep -Pz '(?s)<!--START OF FILE(.*?x x x x x x x){2}\K.*' input.txt

解释

grep -Pz
- -P- 将模式解释为 Perl 兼容的正则表达式 (PCRE)。
- -z- 将其input.txt作为一条大线进行处理。
(?s)<!--START OF FILE(.*?x x x x x x x){2}\K.*
- (?s)- 为正则表达式的其余部分打开“点匹配换行符”。
- .*?- 非贪婪匹配。
- {2}- 模式的重复次数。
- \K- 从最终匹配的字符串中省略任何先前匹配的字符。

Question 4

这个片段：

# Utility functions: print-as-echo, print-line-with-visual-space.
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
pl " Input data file $FILE:"
head -v -n 20 $FILE

pl " Expected output on file $E:"
head -v $E

pl " Results:"
cgrep -V -D -w '<!--START OF FILE -->' +2 +w 'x x x x x x x' 'meta' $FILE

产生：

-----
 Input data file data1:
==> data1 <==
<!--START OF FILE -->
random text
<meta> more random text </meta>
x x x x x x x 
more random text
that I dont need 
x x x x x x x

I need everything
from this point

-----
 Expected output on file expected-output1:

I need everything
from this point
onwards
...

-----
 Results:

I need everything
from this point
onwards
...

这会省略 (-V) 一个以“...START...”开始 (-w) 并以第二次出现 (+2) 字符串“...x x...”结束 (+w) 的窗口。 ' 窗口内有字符串 'meta'。

在这样的系统上：

OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.9 (jessie) 
bash GNU bash 4.3.30

cgrep 的一些详细信息：

cgrep   shows context of matching patterns found in files (man)
Path    : ~/executable/cgrep
Version : 8.15
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYS ...)
Home    : http://sourceforge.net/projects/cgrep/ (doc)

虽然需要获取并编译 cgrep，但我在 32 位或 64 位系统上做到这一点没有任何问题，并且它可以通过brew 在 macOS (High Sierra) 上使用。执行时间与 GNU grep 相当。

最美好的祝愿...干杯，drl

Answer