curl wget - 如何忽略 HTML 标签和标头

curl wget - 如何忽略 HTML 标签和标头

如何仅获取我在浏览器中看到的网页内容...我不需要标题和任何 HTML 标记.. exp-http://www.linfo.org/cat.html..我只想要内容..请帮助

答案1

如果您只想以可打印的形式转储页面,则可以使用文本浏览器(例如 lynx、w3m、elinks)。这些浏览器都有一个“-dump”选项。

这是该页面的开头“山猫转储”

   [1]LINFO

                               The cat Command

   cat is one of the most frequently used [2]commands on [3]Unix-like
   [4]operating systems. It has three related functions with regard to
   text files: displaying them, combining copies of them and creating new
   ones.

   cat's general syntax is

     cat [options] [filenames] [-] [filenames]

   The square brackets indicate that the enclosed items are optional.

相关内容