搜索包含不匹配分隔符的行

Question 1

这是另一个简短的：

awk '{while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {exit 1}'

或者可能

awk '{x=$0;while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {print FILENAME,FNR,x;nextfile}'

这将删除所有平衡的{...}，并且如果仍然存在{或}字符则采取一些操作。

Answer

这是另一个简短的：

awk '{while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {exit 1}'

或者可能

awk '{x=$0;while(gsub(/{[^{}]*}/, "")){ }} /[{}]/ {print FILENAME,FNR,x;nextfile}'

这将删除所有平衡的{...}，并且如果仍然存在{或}字符则采取一些操作。

Question 2

是的，在 grep （使用 PCRE）中是可能的（并且非常精确），但不容易理解。

grep -Px '((?>[^{}]+|\{(?1)\})*)'

或者，定义输入 ( $str) 和适当的正则表达式 ( $re) 我们可以这样做：

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}"

这是如何运作的？

现在的正则表达式可以匹配平衡的结构（不是大多数旧的正则表达式引擎）。

在 PCRE 中，递归是实现这一点的关键。

到匹配一个平衡的集合需要这个结构：

b(m|(?R))*e

b开始模式（{在您的情况下）在哪里，
e结束模式（}在您的情况下）在哪里，
中间m模式（在您的情况下类似[^{}]+）在哪里。

{([^{}]*+|(?R))*}

可能是在这里看到的行动。

但这是一个非锚定匹配，它递归了整个正则表达式 ( ?R)。

锚定版本（至匹配整行) 可以通过使用 grep 选项获得-x。

允许大括号之外的其他文本的完整解决方案变得有点复杂，因此，使用 Perl 正则表达式的选项来忽略我们可以编写的空格。并将正则表达式结构更改为（稍微慢一些）：

((m+|b(?1)e)*)

原来的结构b(m|(?R))*e。

(?(DEFINE)(?'nonbrace'  [^{}\n]       ))  # Define a non-brace
(?(DEFINE)(?'begin'     {             ))  # Define the start text
(?(DEFINE)(?'end'       }             ))  # define the end text 
(?(DEFINE)(?'middle'    (?&nonbrace)  ))  # define the allowed text
                                          # inside the braces

(?(DEFINE)(?'nested'                            # define a nested
    ((?&begin)((?&middle)|(?&nested))*(?&end))  # pattern
  ))                                            # here

^((?&nonbrace)*+(?&nested))*+(?&nonbrace)*$     # finally, use this regex.

作为在这里测试过。

或者替代结构 ((m+|b(?1)e)*)

(?(DEFINE)(?'nonbrace'  [^{}\n]       ))  # Define a non-brace
(?(DEFINE)(?'begin'     \{            ))  # Define the start text
(?(DEFINE)(?'end'       \}            ))  # define the end text 
(?(DEFINE)(?'middle'    (?&nonbrace)  ))  # define the allowed text
                                          # inside the braces

(?(DEFINE)(?'nested'                             # define a nested
     (  (?&middle)++  |  (?&begin)(?&nested)(?&end)  )*
))

^(?&nested)$     # finally, use this regex.

作为在这里测试过

请注意，一旦带有许多 DEFINE 的非常长的正则表达式被正则表达式引擎编译，它的工作速度就会与较短的正则表达式相同。

附加的功能是描述对人类来说更清晰（或者至少我希望如此）。

这显示了对正则表达式的更清晰的描述，通常更容易被人类理解，但使用了 PCRE 中相当深入的正则表达式功能。

脚本

要将所有这些想法与 grep（GNU 和 PCRE）结合使用，请使用以下 shell (bash) 示例：

#!/bin/bash

str=$'
a
abc
{}
{a}
{{aa}}
{a{b}}
{a{bb}a}
{a{b{c}b}a}
n{a{}}nn{b{bb}}
\@writefile{toc}}}}{\\contentsline {section}{\\numberline {B
\@writefile{toc}{\contentsline {section}{\\numberline {B
Previous lines contain mismatched braces. This and the next line don\'t.
\@writefile{toc}{\\contentsline {section}{\\numberline {B}}}
'

re=$'                    
  (?(DEFINE)(?\'nonbrace\'  [^{}\\n]      ))
  (?(DEFINE)(?\'begin\'     {             ))
  (?(DEFINE)(?\'end\'       }             ))
  (?(DEFINE)(?\'middle\'    (?&nonbrace)  ))
  (?(DEFINE)(?\'nested\'
      ((?&begin)((?&middle)|(?&nested))*(?&end))
    ))
  ^((?&nonbrace)*(?&nested))*(?&nonbrace)*$
'

printf '%s\n' "$str" | grep -P "${re//[ $'\n']/}"

a
abc
{}
{a}
{{aa}}
{a{b}}
{a{bb}a}
{a{b{c}b}a}
n{a{}}nn{b{bb}}
Previous lines contain mismatched braces. This and the next line don't.
\@writefile{toc}{\contentsline {section}{\numberline {B}}}

检测结果

最后，要获取所有不匹配的行，请反转输出-v（如果需要在正在运行的 shell 中执行以下内容，请获取上面的脚本）：

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}"

\@writefile{toc}}}}{\contentsline {section}{\numberline {B
\@writefile{toc}{ntentsline {section}{\numberline {B

Answer

是的，在 grep （使用 PCRE）中是可能的（并且非常精确），但不容易理解。

grep -Px '((?>[^{}]+|\{(?1)\})*)'

或者，定义输入 ( $str) 和适当的正则表达式 ( $re) 我们可以这样做：

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}"

这是如何运作的？

现在的正则表达式可以匹配平衡的结构（不是大多数旧的正则表达式引擎）。

在 PCRE 中，递归是实现这一点的关键。

到匹配一个平衡的集合需要这个结构：

b(m|(?R))*e

b开始模式（{在您的情况下）在哪里，
e结束模式（}在您的情况下）在哪里，
中间m模式（在您的情况下类似[^{}]+）在哪里。

{([^{}]*+|(?R))*}

可能是在这里看到的行动。

但这是一个非锚定匹配，它递归了整个正则表达式 ( ?R)。

锚定版本（至匹配整行) 可以通过使用 grep 选项获得-x。

允许大括号之外的其他文本的完整解决方案变得有点复杂，因此，使用 Perl 正则表达式的选项来忽略我们可以编写的空格。并将正则表达式结构更改为（稍微慢一些）：

((m+|b(?1)e)*)

原来的结构b(m|(?R))*e。

(?(DEFINE)(?'nonbrace'  [^{}\n]       ))  # Define a non-brace
(?(DEFINE)(?'begin'     {             ))  # Define the start text
(?(DEFINE)(?'end'       }             ))  # define the end text 
(?(DEFINE)(?'middle'    (?&nonbrace)  ))  # define the allowed text
                                          # inside the braces

(?(DEFINE)(?'nested'                            # define a nested
    ((?&begin)((?&middle)|(?&nested))*(?&end))  # pattern
  ))                                            # here

^((?&nonbrace)*+(?&nested))*+(?&nonbrace)*$     # finally, use this regex.

作为在这里测试过。

或者替代结构 ((m+|b(?1)e)*)

(?(DEFINE)(?'nonbrace'  [^{}\n]       ))  # Define a non-brace
(?(DEFINE)(?'begin'     \{            ))  # Define the start text
(?(DEFINE)(?'end'       \}            ))  # define the end text 
(?(DEFINE)(?'middle'    (?&nonbrace)  ))  # define the allowed text
                                          # inside the braces

(?(DEFINE)(?'nested'                             # define a nested
     (  (?&middle)++  |  (?&begin)(?&nested)(?&end)  )*
))

^(?&nested)$     # finally, use this regex.

作为在这里测试过

请注意，一旦带有许多 DEFINE 的非常长的正则表达式被正则表达式引擎编译，它的工作速度就会与较短的正则表达式相同。

附加的功能是描述对人类来说更清晰（或者至少我希望如此）。

这显示了对正则表达式的更清晰的描述，通常更容易被人类理解，但使用了 PCRE 中相当深入的正则表达式功能。

脚本

要将所有这些想法与 grep（GNU 和 PCRE）结合使用，请使用以下 shell (bash) 示例：

#!/bin/bash

str=$'
a
abc
{}
{a}
{{aa}}
{a{b}}
{a{bb}a}
{a{b{c}b}a}
n{a{}}nn{b{bb}}
\@writefile{toc}}}}{\\contentsline {section}{\\numberline {B
\@writefile{toc}{\contentsline {section}{\\numberline {B
Previous lines contain mismatched braces. This and the next line don\'t.
\@writefile{toc}{\\contentsline {section}{\\numberline {B}}}
'

re=$'                    
  (?(DEFINE)(?\'nonbrace\'  [^{}\\n]      ))
  (?(DEFINE)(?\'begin\'     {             ))
  (?(DEFINE)(?\'end\'       }             ))
  (?(DEFINE)(?\'middle\'    (?&nonbrace)  ))
  (?(DEFINE)(?\'nested\'
      ((?&begin)((?&middle)|(?&nested))*(?&end))
    ))
  ^((?&nonbrace)*(?&nested))*(?&nonbrace)*$
'

printf '%s\n' "$str" | grep -P "${re//[ $'\n']/}"

a
abc
{}
{a}
{{aa}}
{a{b}}
{a{bb}a}
{a{b{c}b}a}
n{a{}}nn{b{bb}}
Previous lines contain mismatched braces. This and the next line don't.
\@writefile{toc}{\contentsline {section}{\numberline {B}}}

检测结果

最后，要获取所有不匹配的行，请反转输出-v（如果需要在正在运行的 shell 中执行以下内容，请获取上面的脚本）：

$ printf '%s\n' "$str" | grep -vP "${re//[ $'\n']/}"

\@writefile{toc}}}}{\contentsline {section}{\numberline {B
\@writefile{toc}{ntentsline {section}{\numberline {B

Question 3

sed@rowboat 方法的翻译awk：

sed 'h; s/[^{}]//g; :1
     s/{}//g; t1
     /./!d; g'

那是：

sed '
  h; # save a copy of the line on the hold space
  s/[^{}]//g; # remove all characters but { and }
  :1
    s/{}//g; # remove the {}s (so starting with inner ones)
  # and loop until there's no more {} to remove
  t1

  /./!d; # if the pattern space does not contain any single
         # character, that means all {} were matched. Delete

  g; # otherwise retrieve the saved copy which will be printed
     # at the end of the cycle'

这是 POSIX，但比awk使用类似 Perl 的递归正则表达式的解决方案要慢得多，例如：

grep -Pvx '((?:[^{}]++|\{(?1)\})*+)'

Answer

sed@rowboat 方法的翻译awk：

sed 'h; s/[^{}]//g; :1
     s/{}//g; t1
     /./!d; g'

那是：

sed '
  h; # save a copy of the line on the hold space
  s/[^{}]//g; # remove all characters but { and }
  :1
    s/{}//g; # remove the {}s (so starting with inner ones)
  # and loop until there's no more {} to remove
  t1

  /./!d; # if the pattern space does not contain any single
         # character, that means all {} were matched. Delete

  g; # otherwise retrieve the saved copy which will be printed
     # at the end of the cycle'

这是 POSIX，但比awk使用类似 Perl 的递归正则表达式的解决方案要慢得多，例如：

grep -Pvx '((?:[^{}]++|\{(?1)\})*+)'

Question 4

使用awk：

对于每条记录，将 sum 初始化为零。
开始逐字符检查一行。
当刚看到左大括号时增加 sum，当看到右大括号时减少 sum。
一旦总和低于零，就停止。
到达 for 循环的末尾时，无论是由于负和而中途还是正常情况下，如果总和非零，则以非零状态退出。
注意：这种方法与计算大括号的数量不同。在这里，一旦总和变为负数，我们就停止处理。

awk 'BEGIN { a["{"]=1;a["}"]=-1 }
{ for (s=i=0; i++<length();) if (0>(s += a[substr($0,i,1)])) break }
s {exit 1}' file

同样的事情在perl

perl -lne '
  local(%h,$^R) = qw/{ 1 } -1/;
  /(?:(?:([{}])(?{$^R+=$h{$1}})|[^{}]+)(?(?{$^R<0})(?!)))+/g;
  exit 1 if $^R;
' file

Perl 具有强大的正则表达式功能，几乎就像它自己的迷你编程语言一样。在正则表达式内部，我们正在执行循环、更新总和并监视总和何时低于零。

Answer

使用awk：

对于每条记录，将 sum 初始化为零。
开始逐字符检查一行。
当刚看到左大括号时增加 sum，当看到右大括号时减少 sum。
一旦总和低于零，就停止。
到达 for 循环的末尾时，无论是由于负和而中途还是正常情况下，如果总和非零，则以非零状态退出。
注意：这种方法与计算大括号的数量不同。在这里，一旦总和变为负数，我们就停止处理。

awk 'BEGIN { a["{"]=1;a["}"]=-1 }
{ for (s=i=0; i++<length();) if (0>(s += a[substr($0,i,1)])) break }
s {exit 1}' file

同样的事情在perl

perl -lne '
  local(%h,$^R) = qw/{ 1 } -1/;
  /(?:(?:([{}])(?{$^R+=$h{$1}})|[^{}]+)(?(?{$^R<0})(?!)))+/g;
  exit 1 if $^R;
' file

Perl 具有强大的正则表达式功能，几乎就像它自己的迷你编程语言一样。在正则表达式内部，我们正在执行循环、更新总和并监视总和何时低于零。

搜索包含不匹配分隔符的行

答案1

答案2

这是如何运作的？

脚本

检测结果

答案3

答案4

相关内容