在Html文件中搜索一个字符串并输出此字符串和.*文件的标签

Question

由于您正在处理 HTML 文档，我建议使用进行结构化文档查询的工具，而不是将输入视为简单文本。例如，假设somedir包含上述示例文档：

$ ls somedir
file1.html  file2.html  file3.html

随后这个答案您可以使用xmllint解析器开关来查找所有元素包含--html的节点，并输出其元素的：bodythe character string to be foundtitlehead

$ find somedir/ -name '*.html' -exec xmllint --html --xpath '
    //*[body[contains(.,"the character string to be found")]]/head/title
  ' {} \;
<title>Title of website 3</title>
XPath set is empty
<title>Title of website 1</title>

请注意，对于 XPath 查询不匹配的文件，xmllint会将消息打印到标准错误流，但也会以非零状态退出。您可以丢弃前者，但使用后者有条件地打印搜索字符串：

$ find somedir/ -name '*.html' -exec xmllint --html --xpath '
    //*[body[contains(.,"the character string to be found")]]/head/title
  ' {} 2>/dev/null \; -printf 'the character string to be found\n'
<title>Title of website 3</title>
the character string to be found
<title>Title of website 1</title>
the character string to be found

或者如果你想打印实际的正文其中找到字符串后，您可以有条件地执行第二个查询：

$ find somedir/ -name '*.html' -exec xmllint --html --xpath '
    //*[body[contains(.,"the character string to be found")]]/head/title
  ' {} 2>/dev/null \; -exec xmllint --html --xpath '
    //*[body[contains(.,"the character string to be found")]]/body/text()
  ' {} \;
<title>Title of website 3</title>

the character string to be found
  
<title>Title of website 1</title>

the character string to be found

（请注意，正文中的换行符不会被删除）。如果您有支持 XPath 2.0 或更高版本的工具，例如西德尔然后，您可以使用该函数在一次调用中组合匹配的元素concat()（尽管我还没有找到一种方法让它将一个元素输出为 HTML 标签，将另一个元素输出为纯文本）：

$ find somedir/ -name '*.html' -exec ./xidel --silent --xpath '
    //*[body[contains(.,"the character string to be found")]]/concat(./head/title, codepoints-to-string(10), ./body)
  ' {} \;
Title of website 3
the character string to be found
Title of website 1
the character string to be found

Answer 1