使用 notepad++ 从文档中提取特定文本

Question 1

Ctrl+H
找什么：<url>\s+<loc>(\S+?)</loc>.+?</url>
用。。。来代替：$1
检查环绕
检查正则表达式
查看. matches newline
Replace all

解释：

<url>       # literally
  \s+       # 1 or more any spaces, including linebreak
  <loc>     # literally
  (\S+?)    # group 1, 1 or more non spaces, not greedy
  </loc>    # literally
  .+?       # 1 or more any characters, not greedy
</url>      # literally

替代品：

$1          # content of group 1, the URL

给定示例的结果：

https://example.com/example0.html
https://example.com/example1.html
https://example.com/example2.html
https://example.com/example3.html
https://example.com/example4.html

Answer

Ctrl+H
找什么：<url>\s+<loc>(\S+?)</loc>.+?</url>
用。。。来代替：$1
检查环绕
检查正则表达式
查看. matches newline
Replace all

解释：

<url>       # literally
  \s+       # 1 or more any spaces, including linebreak
  <loc>     # literally
  (\S+?)    # group 1, 1 or more non spaces, not greedy
  </loc>    # literally
  .+?       # 1 or more any characters, not greedy
</url>      # literally

替代品：

$1          # content of group 1, the URL

给定示例的结果：

https://example.com/example0.html
https://example.com/example1.html
https://example.com/example2.html
https://example.com/example3.html
https://example.com/example4.html

Question 2

可能有更简单的方法，我现在无法访问 Notepad++，但你可以尝试以下方法

搜索：<url>\n\s+<loc>(.*)<\/loc>\n\s.*\n\s.*\n<\/url>

代替：\1

来源regexr.com/46rin

Answer

可能有更简单的方法，我现在无法访问 Notepad++，但你可以尝试以下方法

搜索：<url>\n\s+<loc>(.*)<\/loc>\n\s.*\n\s.*\n<\/url>

代替：\1

来源regexr.com/46rin

使用 notepad++ 从文档中提取特定文本

答案1

答案2

相关内容