我有这个正则表达式,它只找到包含至少 3 个这样的单词的元 html 标签。
<meta name="description" content=.*(( the | that | of ).*){3,}.*>
问题:
我有两行类似的诗句。两行诗句的词都一样,除了第二行,这在不同的地方。那么为什么我的正则表达式只找到第二行,而找不到第一行呢?我该如何更改正则表达式以找到两行?
<meta name="description" content="the mystery of the art that seeks its meaning.">
<meta name="description" content="the mystery of art that seeks the its meaning.">
答案1
对于这样的搜索,您必须使用正向前瞻:
- Ctrl+F
- 找什么:
<meta name="description" content="(?=[^">]*?\bthe\b)(?=[^">]*?\bthat\b)(?=[^">]*?\bof\b )[^">]*">
- 查看 环绕
- 查看 正则表达式
- Find All in Current Document
解释:
<meta name="description" content=" # literally
(?= # positive lookahead, make sure we have after:
[^">]*? # 0 or more any character that is not " or >
\b # word boundary
the # the word the
\b # word boundary
) # end lookahead
(?=[^">]*?\bthat\b) # same for the word that
(?=[^">]*?\bof\b ) # same for the word of
[^">]* # 0 or more any character that is not " or >
"> # literally
截屏:
答案2
第一行包含单词of
和the
,您正在搜索前后有空格的三个单词。尝试在of
和之间插入另一个单词the
不要在正则表达式中放置实际的空格,例如... WORD ...
尝试使用单词边界...\bWORD\b...
答案3
是的,我找到了另一个解决方案:
<meta name="description" content=.*(\b(the|that|of)\b.*){3,}.*>