我需要删除所有 html 标签,例如<p style="text-align: center;">
,但html 标签中的 </em>
和除外</em>
<p class="glovo"></p>
例子:
<p class="glovo">In these <p style="text-align: center;"> situations we may be forgetting to really <em>bend</em> at our practice and <em>sweat</em> at it.</p>
必须成为:
<p class="glovo">In these situations we may be forgetting to really <em>bend</em> at our practice and <em>sweat</em> at it.</p>
我使用这个通用公式:
REGION-START(?=(?:(?!REGION-FINAL).)*?FIND REGEX)(?=(?:(?!REGION-FINAL).)).+?REGION-FINAL\R?
REGION-START = <p class="glovo">
REGION-FINAL = </p>
FIND REGEX = <(?!/)[^>]*[^/]>(?!<em>|</em>)
因此,我的最终正则表达式变成:
FIND:
<p class="glovo">(?=(?:(?!</p>).)*?<(?!/)[^>]*[^/]>(?!<em>|</em>))(?=(?:(?!</p>).)).+?</p>\R?
REPLACE BY: (LEAVE EMPTY)
问题是我的正则表达式选择了整个 html 标记,而不仅仅是其中的标记。有人能帮助我吗?
答案1
- Ctrl+H
- 找什么:
(?:<p class="glovo">|\G).*?\K<(?!/?em>).*?>(?=.*</p>)
- 用。。。来代替:
LEAVE EMPTY
- 打钩 环绕
- 选择 正则表达式
- Replace all
解释:
(?: # non capture group
<p class="glovo"> # literally
| # OR
\G # restart from last match position
) # end group
.*? # 0 or more any character, not greedy
\K # forget all we have seen until this position
< # literally <
(?!/?em>) # not followed by em or /em
.*? # 0 or more any character, not greedy
>
(?=.*</p>) # positive lookahead, make sure we have </p> somewhere after
截图(之前):
截图(之后):