删除命令行中的（可能是嵌套的）文本引号

Question 1

如果您知道输入不包含<或>字符，您可以这样做：

sed '
  # replace opening quote with <
  s|\[quote=[^]]*\]|<|g
  # and closing quotes with >
  s|\[/quote\]|>|g
  :1
    # work our way from the inner quotes
    s|<[^<>]*>||g
  t1'

如果它可能包含<或>字符，您可以使用如下方案转义它们：

sed '
  # escape < and > (and the escaping character _ itself)
  s/_/_u/g; s/</_l/g; s/>/_r/g

  <code-above>

  # undo escaping after the work has been done
  s/_r/>/g; s/_l/</g; s/_u/_/g'

与perl, 使用递归正则表达式：

perl -pe 's@(\[quote=[^\]]*\](?:(?1)|.)*?\[/quote\])@@g'

或者甚至，正如您提到的：

perl -pe 's@(\[quote=.*?\](?:(?1)|.)*?\[/quote\])@@g'

使用perl，您可以通过添加选项来处理多行输入-0777。对于sed，您需要在代码前添加以下前缀：

:0
$!{
  N;b0
}

从而将整个输入加载到模式空间中。

Answer

如果您知道输入不包含<或>字符，您可以这样做：

sed '
  # replace opening quote with <
  s|\[quote=[^]]*\]|<|g
  # and closing quotes with >
  s|\[/quote\]|>|g
  :1
    # work our way from the inner quotes
    s|<[^<>]*>||g
  t1'

如果它可能包含<或>字符，您可以使用如下方案转义它们：

sed '
  # escape < and > (and the escaping character _ itself)
  s/_/_u/g; s/</_l/g; s/>/_r/g

  <code-above>

  # undo escaping after the work has been done
  s/_r/>/g; s/_l/</g; s/_u/_/g'

与perl, 使用递归正则表达式：

perl -pe 's@(\[quote=[^\]]*\](?:(?1)|.)*?\[/quote\])@@g'

或者甚至，正如您提到的：

perl -pe 's@(\[quote=.*?\](?:(?1)|.)*?\[/quote\])@@g'

使用perl，您可以通过添加选项来处理多行输入-0777。对于sed，您需要在代码前添加以下前缀：

:0
$!{
  N;b0
}

从而将整个输入加载到模式空间中。

Question 2

我检查了这个，它对我有用。您可能想选择另一个临时模式而不是foobar.如果没有它，sed就会删除标签之间的所有内容，只留下text part 1 text part 3

sed -e 's/\/quote\]/foobar\]/3' -e 's/\[.*\/quote\]//' -e 's/\[.*foobar]//' testfile

相反，如果testfile你可以用管道来代替cat

Answer

我检查了这个，它对我有用。您可能想选择另一个临时模式而不是foobar.如果没有它，sed就会删除标签之间的所有内容，只留下text part 1 text part 3

sed -e 's/\/quote\]/foobar\]/3' -e 's/\[.*\/quote\]//' -e 's/\[.*foobar]//' testfile

相反，如果testfile你可以用管道来代替cat

Question 3

一个小脚本，在每个起始引号上递增计数器变量，并在每个结束引号上递减计数器变量。如果计数器变量更大0，则跳过文本片段。

#!/bin/bash

# disable pathname expansion
set -f    
cnt=0
for i in $(<$1); do
        # start quote
        if [ "${i##[quote=}" != "$i" ] && [ "${i: -1}" = "]" ]; then
                ((++cnt))
        elif [ "$i" = "[/quote]" ]; then
                ((--cnt))
        elif [ $cnt -eq 0 ]; then
                echo -n "$i "
        fi
done
echo

输出：

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

Answer

一个小脚本，在每个起始引号上递增计数器变量，并在每个结束引号上递减计数器变量。如果计数器变量更大0，则跳过文本片段。

#!/bin/bash

# disable pathname expansion
set -f    
cnt=0
for i in $(<$1); do
        # start quote
        if [ "${i##[quote=}" != "$i" ] && [ "${i: -1}" = "]" ]; then
                ((++cnt))
        elif [ "$i" = "[/quote]" ]; then
                ((--cnt))
        elif [ $cnt -eq 0 ]; then
                echo -n "$i "
        fi
done
echo

输出：

$ cat q1
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q1
text part 1 text part 2 text part 3
$ cat q2
text part 1 [quote=foo] outer quote 1 [quote=bar] inner quote [foo] [/quote] outer quote 2 [/quote] text part 2 [quote=foo-bar] next quote [/quote] text part 3
$ ./parse.sh q2
text part 1 text part 2 text part 3

Question 4

您可以按照POSIX sed此处的详细说明执行此操作。请注意，此解决方案适用于您显示的两种输入。输入的限制不是多行，因为我们使用换行符作为标记来实现所需的转换。

$ sed -e '
      :top
      /\[\/quote]/!b
      s//\
&/
      s/\[quote=/\
\
&/

     :loop
        s/\(\n\n\)\(\[quote=.*\)\(\[quote=.*\n\)/\2\1\3/
     tloop

     s/\n\n.*\n\[\/quote]//
     btop
 '  input.txt

Answer

您可以按照POSIX sed此处的详细说明执行此操作。请注意，此解决方案适用于您显示的两种输入。输入的限制不是多行，因为我们使用换行符作为标记来实现所需的转换。

$ sed -e '
      :top
      /\[\/quote]/!b
      s//\
&/
      s/\[quote=/\
\
&/

     :loop
        s/\(\n\n\)\(\[quote=.*\)\(\[quote=.*\n\)/\2\1\3/
     tloop

     s/\n\n.*\n\[\/quote]//
     btop
 '  input.txt

删除命令行中的（可能是嵌套的）文本引号

答案1

答案2

答案3

答案4

相关内容