获取网页完整数据

获取网页完整数据

我正在使用 busybox 工具,我想获取网页中的所有 http 链接。我使用curl 或wget 保存示例链接页面。但是,它将页面保存为 html。如何使用curl或wget命令来做到这一点?

example webpage = http://www.turanevdekorasyon.com/wp-includes/test/ 

以下数据是用firefox浏览器以文本格式保存的。

Index of /wp-includes/test/

Name <http://www.turanevdekorasyon.com/wp-includes/test/?ND>                                                                             Last modified <http://www.turanevdekorasyon.com/wp-includes/test/?MA>         Size <http://www.turanevdekorasyon.com/wp-includes/test/?SA>  Description  <http://www.turanevdekorasyon.com/wp-includes/test/?DA>

------------------------------------------------------------------------
up Parent Directory <http://www.turanevdekorasyon.com/wp-includes/>                                                                 28-May-2019 02:15        -       
[CMP] v1.0.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.0.zip>                                                                         28-May-2019 02:15       4k       
[CMP] v1.1.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.1.zip>                                                                         28-May-2019 02:15       4k       
[CMP] v1.2.zip <http://www.turanevdekorasyon.com/wp-includes/test/v1.2.zip>                                                                         28-May-2019 02:15       4k       

------------------------------------------------------------------------
Proudly Served by LiteSpeed Web Server at www.turanevdekorasyon.com Port 80

答案1

我建议使用 F伊莱|节省AChromium 的功能并将网页保存在MHT 格式打开实验性的“将页面另存为 MHTML”选项后,通过在 Chrome 浏览器中访问链接“chrome://flags/#save-page-as-mhtml”。

答案2

使用有什么意义卷曲或者获取?使用山猫:

lynx -dump 'www.example.com'

它将输出所有显示和隐藏的链接。

相关内容