AWK

Question 1

AWK

使用 GNU awk 或 mawk：

$ awk '$1~"^"word{printf("--\n%s",$0)}' word='are' RS='--\n' infile
--
are you happy
--
are(you hungry
too

这将变量 word 设置为要在记录开头匹配的单词，并将 RS（记录分隔符）设置为“--”，后跟新行\n。然后，对于以匹配 ( ) 的单词开头的任何记录，$1~"^"word打印格式化记录。格式是以“--”开头，并带有一个新行，其中包含找到的确切记录。

GREP

使用（GNU 选项-z）grep：

grep -Pz -- '--\nare(?:[^\n]*\n)+?(?=--|\Z)' infile
grep -Pz -- '(?s)--\nare.*?(?=\n--|\Z)\n' infile
grep -Pz -- '(?s)--\nare(?:(?!\n--).)*\n' infile

描述对于以下描述，PCRE 选项(?x)用于添加（大量）与实际（工作）正则表达式内联的解释注释（和空格）。如果注释（和大多数空格）（直到下一个换行符）被删除，则生成的字符串仍然是相同的正则表达式。这允许在工作代码中详细描述正则表达式。这使得代码维护变得更加容易。

选项 1 正则表达式 `(?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)`

(?x)   # match the remainder of the pattern with the following
       # effective flags: x
       #      x modifier: extended. Spaces and text after a # 
       #      in the pattern are ignored
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
(?:    #      Non-Capturing Group (?:[^\n]*\n)+?
[^\n]  #           matches non-newline characters
*      #           Quantifier — Matches between zero and unlimited times, as
       #           many times as possible, giving back as needed (greedy)
\n     #           matches a line-feed (newline) character (ASCII 10)
)      #      Close the Non-Capturing Group
+?     # Quantifier — Matches between one and unlimited times, as
       # few times as possible, expanding as needed (lazy)
       # A repeated capturing group will only capture the last iteration.
       # Put a capturing group around the repeated group to capture all
       # iterations or use a non-capturing group instead if you're not
       # interested in the data
(?=    # Positive Lookahead (?=--|\Z)
       # Assert that the Regex below matches
       #      1st Alternative --
--     #           matches the characters -- literally (case sensitive)
|      #      2nd Alternative \Z
\Z     #           \Z asserts position at the end of the string, or before
       #           the line terminator right at the end of the 
       #           string (if any)
)      #      Closing the lookahead.

选项 2 正则表达式 `(?sx)--\nare.*?(?=\n--|\Z)\n`

(?sx)  # match the remainder of the pattern with the following eff. flags: sx
       #        s modifier: single line. Dot matches newline characters
       #        x modifier: extended. Spaces and text after a # in 
       #        the pattern are ignored
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
.*?    # matches any character 
       #        Quantifier — Matches between zero and unlimited times,
       #        as few times as possible, expanding as needed (lazy).
(?=    # Positive Lookahead (?=\n--|\Z)
       # Assert that the Regex below matches
       #        1st Alternative \n--
\n     #               matches a line-feed (newline) character (ASCII 10)
--     #               matches the characters -- literally.
|      #        2nd Alternative \Z
\Z     #               \Z asserts position at the end of the string, or
       #               before the line terminator right at
       #               the end of the string (if any)
)      # Close the lookahead parenthesis.
\n     #        matches a line-feed (newline) character (ASCII 10)

选项 3 正则表达式 `(?xs)--\nare(?:(?!\n--).)*\n`

(?xs)  # match the remainder of the pattern with the following eff. flags: xs
       # modifier x : extended. Spaces and text after a # in are ignored
       # modifier s : single line. Dot matches newline characters
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
(?:    # Non-capturing group (?:(?!\n--).)
(?!    #      Negative Lookahead (?!\n--)
       #           Assert that the Regex below does not match
\n     #                matches a line-feed (newline) character (ASCII 10)
--     #                matches the characters -- literally
)      #      Close Negative lookahead
.      #      matches any character
)      # Close the Non-Capturing group.
*      # Quantifier — Matches between zero and unlimited times, as many
       # times as possible, giving back as needed (greedy)
\n     # matches a line-feed (newline) character (ASCII 10)

sed

$ sed -nEe 'bend
            :start  ;N;/^--\nare/!b
            :loop   ;/^--$/!{p;n;bloop}
            :end    ;/^--$/bstart'           infile

Answer

AWK

使用 GNU awk 或 mawk：

$ awk '$1~"^"word{printf("--\n%s",$0)}' word='are' RS='--\n' infile
--
are you happy
--
are(you hungry
too

这将变量 word 设置为要在记录开头匹配的单词，并将 RS（记录分隔符）设置为“--”，后跟新行\n。然后，对于以匹配 ( ) 的单词开头的任何记录，$1~"^"word打印格式化记录。格式是以“--”开头，并带有一个新行，其中包含找到的确切记录。

GREP

使用（GNU 选项-z）grep：

grep -Pz -- '--\nare(?:[^\n]*\n)+?(?=--|\Z)' infile
grep -Pz -- '(?s)--\nare.*?(?=\n--|\Z)\n' infile
grep -Pz -- '(?s)--\nare(?:(?!\n--).)*\n' infile

描述对于以下描述，PCRE 选项(?x)用于添加（大量）与实际（工作）正则表达式内联的解释注释（和空格）。如果注释（和大多数空格）（直到下一个换行符）被删除，则生成的字符串仍然是相同的正则表达式。这允许在工作代码中详细描述正则表达式。这使得代码维护变得更加容易。

选项 1 正则表达式 `(?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)`

(?x)   # match the remainder of the pattern with the following
       # effective flags: x
       #      x modifier: extended. Spaces and text after a # 
       #      in the pattern are ignored
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
(?:    #      Non-Capturing Group (?:[^\n]*\n)+?
[^\n]  #           matches non-newline characters
*      #           Quantifier — Matches between zero and unlimited times, as
       #           many times as possible, giving back as needed (greedy)
\n     #           matches a line-feed (newline) character (ASCII 10)
)      #      Close the Non-Capturing Group
+?     # Quantifier — Matches between one and unlimited times, as
       # few times as possible, expanding as needed (lazy)
       # A repeated capturing group will only capture the last iteration.
       # Put a capturing group around the repeated group to capture all
       # iterations or use a non-capturing group instead if you're not
       # interested in the data
(?=    # Positive Lookahead (?=--|\Z)
       # Assert that the Regex below matches
       #      1st Alternative --
--     #           matches the characters -- literally (case sensitive)
|      #      2nd Alternative \Z
\Z     #           \Z asserts position at the end of the string, or before
       #           the line terminator right at the end of the 
       #           string (if any)
)      #      Closing the lookahead.

选项 2 正则表达式 `(?sx)--\nare.*?(?=\n--|\Z)\n`

(?sx)  # match the remainder of the pattern with the following eff. flags: sx
       #        s modifier: single line. Dot matches newline characters
       #        x modifier: extended. Spaces and text after a # in 
       #        the pattern are ignored
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
.*?    # matches any character 
       #        Quantifier — Matches between zero and unlimited times,
       #        as few times as possible, expanding as needed (lazy).
(?=    # Positive Lookahead (?=\n--|\Z)
       # Assert that the Regex below matches
       #        1st Alternative \n--
\n     #               matches a line-feed (newline) character (ASCII 10)
--     #               matches the characters -- literally.
|      #        2nd Alternative \Z
\Z     #               \Z asserts position at the end of the string, or
       #               before the line terminator right at
       #               the end of the string (if any)
)      # Close the lookahead parenthesis.
\n     #        matches a line-feed (newline) character (ASCII 10)

选项 3 正则表达式 `(?xs)--\nare(?:(?!\n--).)*\n`

(?xs)  # match the remainder of the pattern with the following eff. flags: xs
       # modifier x : extended. Spaces and text after a # in are ignored
       # modifier s : single line. Dot matches newline characters
--     # matches the characters -- literally (case sensitive)
\n     # matches a line-feed (newline) character (ASCII 10)
are    # matches the characters are literally (case sensitive)
(?:    # Non-capturing group (?:(?!\n--).)
(?!    #      Negative Lookahead (?!\n--)
       #           Assert that the Regex below does not match
\n     #                matches a line-feed (newline) character (ASCII 10)
--     #                matches the characters -- literally
)      #      Close Negative lookahead
.      #      matches any character
)      # Close the Non-Capturing group.
*      # Quantifier — Matches between zero and unlimited times, as many
       # times as possible, giving back as needed (greedy)
\n     # matches a line-feed (newline) character (ASCII 10)

sed

$ sed -nEe 'bend
            :start  ;N;/^--\nare/!b
            :loop   ;/^--$/!{p;n;bloop}
            :end    ;/^--$/bstart'           infile

Question 2

使用 GNUawk或mawk：

$ awk -v word="are" -v RS='--\n' -v ORS='--\n' '$1 ~ "^" word "[[:punct:]]?"' file
are you happy
--
are(you hungry
too
--

这将输入和输出的记录分隔符设置为--后跟换行符。每个段落的第一个单词可以在中找到$1。我们将其与给定的单词（可能后跟标点符号）进行匹配。如果它们匹配，则打印该段落。

请注意，输出中的段落标记将放置在每个段落的末尾而不是开头，因为我们用来ORS输出它们。

使用sed脚本：

:top
/^--/!d;                   # This is not a new paragraph, delete
N;                         # Append next line
/^--\nare[[:punct:]]?/!d;  # This is not a paragraph we want, delete
:record
n;                         # Output line, get next
/^--/!brecord;             # Not yet done with this record, branch to :record
btop;                      # Branch to :top

跑步：

$ sed -E -f script.sed file
--
are you happy
--
are(you hungry
too

或者，作为使用 shell 变量的单行代码$word：

sed -E -e ':t;/^--/!d;N;' \
       -e "/^--\n$word[[:punct:]]?/!d" \
       -e ':r;n;/^--/!br;bt' file

Answer

使用 GNUawk或mawk：

$ awk -v word="are" -v RS='--\n' -v ORS='--\n' '$1 ~ "^" word "[[:punct:]]?"' file
are you happy
--
are(you hungry
too
--

这将输入和输出的记录分隔符设置为--后跟换行符。每个段落的第一个单词可以在中找到$1。我们将其与给定的单词（可能后跟标点符号）进行匹配。如果它们匹配，则打印该段落。

请注意，输出中的段落标记将放置在每个段落的末尾而不是开头，因为我们用来ORS输出它们。

使用sed脚本：

:top
/^--/!d;                   # This is not a new paragraph, delete
N;                         # Append next line
/^--\nare[[:punct:]]?/!d;  # This is not a paragraph we want, delete
:record
n;                         # Output line, get next
/^--/!brecord;             # Not yet done with this record, branch to :record
btop;                      # Branch to :top

跑步：

$ sed -E -f script.sed file
--
are you happy
--
are(you hungry
too

或者，作为使用 shell 变量的单行代码$word：

sed -E -e ':t;/^--/!d;N;' \
       -e "/^--\n$word[[:punct:]]?/!d" \
       -e ':r;n;/^--/!br;bt' file

Question 3

我知道，这是一个老问题，但是看到所有这些循环、分支和模式杂耍，当一个简单的

sed '/^--$/!{H;$!d;};x;/^--\nare/!d'

以自然的方式做同样的事情。

sed是一个逐行流编辑器；因此，如果您需要多行内容，请H在段落标记 ( ^--$) ex更改缓冲区上收集保留空间中的这些行，并测试是否打印该段落（^--\nare意味着一行--后面跟着一行以开头的行are）。已经x用段落标记预加载了保留空间。

你不需要带有狂野扩展的 GNU 工具，你不需要编程技能，只需要参与其中sed。

Answer

我知道，这是一个老问题，但是看到所有这些循环、分支和模式杂耍，当一个简单的

sed '/^--$/!{H;$!d;};x;/^--\nare/!d'

以自然的方式做同样的事情。

sed是一个逐行流编辑器；因此，如果您需要多行内容，请H在段落标记 ( ^--$) ex更改缓冲区上收集保留空间中的这些行，并测试是否打印该段落（^--\nare意味着一行--后面跟着一行以开头的行are）。已经x用段落标记预加载了保留空间。

你不需要带有狂野扩展的 GNU 工具，你不需要编程技能，只需要参与其中sed。

Question 4

看了你的问题我也有这样的感觉应该可以使用grep+来解决它PCRE。

#1 方法解决了这个问题，感谢 @issac 的帮助。
方法 #2 显示了如何使用内联修饰符 ( (?s)) 和前瞻 ( ?!...)。
我最初的解决方案（#3）在大多数情况下都运行良好，除了我在下面部分中突出显示的类型。

grep 方法 #1

$ grep -Pzo -- '--\nare([^\n]*\n)+?(?=--|\Z)' afile

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

--\nare([^\n]*\n)+?(?=--|\Z)
- 匹配双破折号，后跟一个are，然后是零个或多个非换行符的延续 - 或换行符。
- 将+?匹配 1 个或多个，但不是贪婪的，因此不会积极地继续。
- 最后，(?=--|\Z)块末尾的守卫寻找下一个双破折号--或文件末尾（\Z）。

grep 方法 #2

此方法使用 DOTALL 内联修饰符来.匹配换行符 (`n`)。

$ grep -Pzo -- '(?s)--\nare((?!\n--).)+\n' afile

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

(?s)- 内联修饰符 DOTALL - 所有点都匹配换行符
--\nare- 匹配换行符后跟are
((?!\n--).)+\n.-只要向前查找(?!\n--)不遇到\n--.就匹配字符。整个匹配块需要至少有一个或多个 ( +) 并后跟换行符\n。

grep 方法#3（原始）

这是一个grep利用 PCRE 扩展 ( -P) 的解决方案。此方法适用于提供的所有示例，但对于如下示例会失败：

--
are
some-other-dasher

但在大多数情况下，我可以想象必须应对。

$ grep -Pzo -- '--\nare[^\r\n]+[^-]+' afile
--
are you happy

--
are(you hungry
too

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

'--\nare[^\r\n]+[^-]+'
- 匹配双破折号后跟换行符和单词are。
- 然后它将继续打印该行的其余部分，are直到遇到换行符。
- 然后它会打印字符，直到遇到一系列破折号。

参考

Answer

看了你的问题我也有这样的感觉应该可以使用grep+来解决它PCRE。

#1 方法解决了这个问题，感谢 @issac 的帮助。
方法 #2 显示了如何使用内联修饰符 ( (?s)) 和前瞻 ( ?!...)。
我最初的解决方案（#3）在大多数情况下都运行良好，除了我在下面部分中突出显示的类型。

grep 方法 #1

$ grep -Pzo -- '--\nare([^\n]*\n)+?(?=--|\Z)' afile

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

--\nare([^\n]*\n)+?(?=--|\Z)
- 匹配双破折号，后跟一个are，然后是零个或多个非换行符的延续 - 或换行符。
- 将+?匹配 1 个或多个，但不是贪婪的，因此不会积极地继续。
- 最后，(?=--|\Z)块末尾的守卫寻找下一个双破折号--或文件末尾（\Z）。

grep 方法 #2

此方法使用 DOTALL 内联修饰符来.匹配换行符 (`n`)。

$ grep -Pzo -- '(?s)--\nare((?!\n--).)+\n' afile

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

(?s)- 内联修饰符 DOTALL - 所有点都匹配换行符
--\nare- 匹配换行符后跟are
((?!\n--).)+\n.-只要向前查找(?!\n--)不遇到\n--.就匹配字符。整个匹配块需要至少有一个或多个 ( +) 并后跟换行符\n。

grep 方法#3（原始）

这是一个grep利用 PCRE 扩展 ( -P) 的解决方案。此方法适用于提供的所有示例，但对于如下示例会失败：

--
are
some-other-dasher

但在大多数情况下，我可以想象必须应对。

$ grep -Pzo -- '--\nare[^\r\n]+[^-]+' afile
--
are you happy

--
are(you hungry
too

怎么运行的

grep 开关

-P- PCRE 扩展已启用
-z- 将输入视为多行，使用 NUL 代替\n（换行符）
-o- 只显示匹配项

正则表达式

'--\nare[^\r\n]+[^-]+'
- 匹配双破折号后跟换行符和单词are。
- 然后它将继续打印该行的其余部分，are直到遇到换行符。
- 然后它会打印字符，直到遇到一系列破折号。

AWK

答案1

AWK

GREP

选项 1 正则表达式 `(?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)`

选项 2 正则表达式 `(?sx)--\nare.*?(?=\n--|\Z)\n`

选项 3 正则表达式 `(?xs)--\nare(?:(?!\n--).)*\n`

sed

答案2

答案3

答案4

grep 方法 #1

怎么运行的

grep 方法 #2

怎么运行的

grep 方法#3（原始）

怎么运行的

参考

相关内容

答案1

AWK

GREP

选项 1 正则表达式 (?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)

选项 2 正则表达式 (?sx)--\nare.*?(?=\n--|\Z)\n

选项 3 正则表达式 (?xs)--\nare(?:(?!\n--).)*\n

sed

答案2

答案3

答案4

grep 方法 #1

怎么运行的

grep 方法 #2

怎么运行的

grep 方法#3（原始）

怎么运行的

参考

相关内容

选项 1 正则表达式 `(?x)--\nare(?:[^\n]*\n)+?(?=--|\Z)`

选项 2 正则表达式 `(?sx)--\nare.*?(?=\n--|\Z)\n`

选项 3 正则表达式 `(?xs)--\nare(?:(?!\n--).)*\n`