SED--复杂文本删除和模式匹配

Question

如果你可以使用 perl，这里有一种方法可以删除所有<connection...</connection>包含以下内容的块state="wreck"

cat file.txt
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah
blah blah
<connection ...
... state="another" ...
</connection>
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah

perl -0 -pe 's#<connection(?:(?!</connection>).)*state="wreck"(?:(?!</connection>).)*</connection>##gs' file.txt
blah blah

blah blah

blah blah
blah blah
<connection ...
... state="another" ...
</connection>
blah blah

blah blah

解释：

-0      # slurp mode, read the file as it has only 1 line
-pe     # print current line, execute the following instructions

正则表达式：

s#                      : substitute, regex delimiter
<connection             : literally
(?:                     : start non capture group
    (?!</connection>)   : negative lookahead, make sure we don't find </connection>
    .                   : any character, including newline because of the s flag
)*                      : group may appear 0 or more times
state="wreck"           : literally
(?:                     : start non capture group
    (?!</connection>)   : negative lookahead, make sure we don't find </connection>
    .                   : any character, including newline because of the s flag
)*                      : group may appear 0 or more times
</connection>           : literally
##gs                    : replace with empty string, global, dot match newline

Answer 1

如果你可以使用 perl，这里有一种方法可以删除所有<connection...</connection>包含以下内容的块state="wreck"

cat file.txt
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah
blah blah
<connection ...
... state="another" ...
</connection>
blah blah
<connection ...
... state="wreck" ...
</connection>
blah blah

perl -0 -pe 's#<connection(?:(?!</connection>).)*state="wreck"(?:(?!</connection>).)*</connection>##gs' file.txt
blah blah

blah blah

blah blah
blah blah
<connection ...
... state="another" ...
</connection>
blah blah

blah blah

解释：

-0      # slurp mode, read the file as it has only 1 line
-pe     # print current line, execute the following instructions

正则表达式：

s#                      : substitute, regex delimiter
<connection             : literally
(?:                     : start non capture group
    (?!</connection>)   : negative lookahead, make sure we don't find </connection>
    .                   : any character, including newline because of the s flag
)*                      : group may appear 0 or more times
state="wreck"           : literally
(?:                     : start non capture group
    (?!</connection>)   : negative lookahead, make sure we don't find </connection>
    .                   : any character, including newline because of the s flag
)*                      : group may appear 0 or more times
</connection>           : literally
##gs                    : replace with empty string, global, dot match newline

SED--复杂文本删除和模式匹配

答案1

相关内容