用于从 HTML 文件中删除某些 span 元素的脚本

Question 1

Perl 可以做到这一点，甚至可以跨越换行符。

将其转储到文件中（我将其称为 example.html）：

<p>Here is some <span>foo bar</span> example text.</p>
<p>Some text even <span>foo
bar</span> spans across line breaks.</p>

然后尝试一下：

$ perl -0777 -pe 's/<span.*?<\/span>//gs' example.html
<p>Here is some  example text.</p>
<p>Some text even  spans across line breaks.</p>

Answer

Perl 可以做到这一点，甚至可以跨越换行符。

将其转储到文件中（我将其称为 example.html）：

<p>Here is some <span>foo bar</span> example text.</p>
<p>Some text even <span>foo
bar</span> spans across line breaks.</p>

然后尝试一下：

$ perl -0777 -pe 's/<span.*?<\/span>//gs' example.html
<p>Here is some  example text.</p>
<p>Some text even  spans across line breaks.</p>

Question 2

如果您的 HTML 是格式良好的 XML，您可以使用 XML 处理工具（例如xmlstarlet.假设文件是original.html：

xmlstarlet ed -O -d '/html//span[@class = "foo"]' original.html

输出

<html>
  <head>
    <title>hello world</title>
  </head>
  <body>
lorem ipsum

alpha beta
  </body>
</html>

Answer

如果您的 HTML 是格式良好的 XML，您可以使用 XML 处理工具（例如xmlstarlet.假设文件是original.html：

xmlstarlet ed -O -d '/html//span[@class = "foo"]' original.html

输出

<html>
  <head>
    <title>hello world</title>
  </head>
  <body>
lorem ipsum

alpha beta
  </body>
</html>

相关内容