我有这个标签:
<span class="text_obisnuit2">* Notă:</span>John Wells - <em>My Dreams</em>, Albatros Books, 1986.</p>
和这个:
<span class="text_obisnuit1">* Notă:</span>Mariah Carey - <em>Lovers on the road</em>, BackStreet Books, 1965.</p>
所以,我想找到那些特定的 html 标签<span class="text_obisnuit2">
包含以下单词(字符串): Albatros
和 <em>
和</em>
(第一行)
答案1
这是一个很简单的方法,但它需要在标签后面加上“Albatros” <em>
(演示):
(<span class="text_obisnuit2">).*<em>.*<\/em>.*Albatros.*
接下来的那个不关心它们的顺序(演示):
(<span class="text_obisnuit2">).*(<em>.*<\/em>.*Albatros.*|Albatros.*<em>.*<\/em>.*)
这是另一种变体,其中后面的数字无关紧要text_obisnuit
,整个span
标签被捕获为第一组(演示):
(<span class="text_obisnuit\d+">.*<\/span>).*(<em>.*<\/em>.*Albatros.*|Albatros.*<em>.*<\/em>.*)
所有正则表达式都假设条目在文件中各自占一行。也许检测<p>
和</p>
作为边界更有意义,但为此我们需要从输入文件中摘录更大的内容。
答案2
- Ctrl+F
- 找什么:
<span class="text_obisnuit2">(?=.*?<em>.*?</em>)(?=.*?\bAlbatros\b).*$
- 查看 环绕
- 查看 正则表达式
- 取消选中
. matches newline
- Find All in Current Document
解释:
<span class="text_obisnuit2"> # literally
(?= # positive lookahead, make sure we have after:
.*? # 0 or more any character but newline, not greedy
<em> # literally open em tag
.*? # 0 or more any character but newline, not greedy
</em> # literally close em tag
) # end lookahead
(?= # positive lookahead, make sure we have after:
.*? # 0 or more any character but newline, not greedy
\bAlbatros\b # Albatros with word boundaries
) # end lookahead
.* # 0 or more any character but newline
$ # end of line
截屏: