使用 wget 下载大型网站

2024-6-12 • tag-icon

我正在尝试镜像一个非常大的网站，但 wget 似乎从未正确完成。我正在使用以下命令：

wget -r -l inf -nc -w 0.5 {the-site}

我下载了该网站的大部分内容，但不是全部。内容更新速度不够快，没必要使用时间戳。

运行一夜后，出现以下消息：

File `{filename}.html' already there; not retrieving.
File `{filename}.html' already there; not retrieving.
File `{filename}.html' already there; not retrieving.
File `{filename}.html' already there; not retrieving.
Killed

有人知道发生了什么事以及我该如何解决吗？

答案1

您是否尝试过使用选项-m?
这是一条捷径，

-N -r -l inf --no-remove-listing

您还可以专门尝试使用更深的 URL 来获取有限的文件集，并避免使用以下方式获取父路径：

-np

答案1

相关内容