正则表达式:从行中选择除标签之外的所有内容

正则表达式:从行中选择除标签之外的所有内容

我有这样的行。

<li><a href="love-and-attitude.html" title="Love and Attitude">Love and Attitude (24)</a></li>
<li><a href="paint-and-gain.html" title="Paint And Gain">Paint And Gain (15)</a></li>
<li><a href="mother-and-father.html" title="Mother And Father">Mother And Father (19)</a></li>

我想使用正则表达式仅选择标题标签和结束标签之间的文本。因此,使用正则表达式后,我应该可以做到。期望的输出:

愛與态度 (24)

油漆和增益 (15)

妈妈和爸爸 (19)

答案1

警告:如果您想要提取标签>的任何属性或文本,则此方法将不起作用。在这种情况下,您必须改用解析器。<a><

  • Ctrl+H
  • 找什么:^\h*<li><a[^>]+>([^<]+).+$
  • 用。。。来代替:$1
  • 取消勾选匹配大小写
  • 检查环绕
  • 检查正则表达式
  • 请勿检查. matches newline
  • Replace all

解释:

^           # beginning of line
  \h*       # 0 or more horizontal spaces
  <li><a    # literally
  [^>]+     # 1 or more any character that is not >
  >         # literally >
  (         # start group 1
    [^<]+   # 1 or more any character that is not <
  )         # end group 1
  .+        # 1 or more any character
$           # end of line

替代品:

$1          : content of group 1 (i.e. the text you want)

给定示例的结果:

Love and Attitude (24)
Paint And Gain (15)
Mother And Father (19)

相关内容