wget 失败:连接超时

wget 失败:连接超时

我有以下命令来复制网站,

当它尝试访问 sun.com 时,连接超时了。

我希望 wget 排除 sun.com,以便 wget 能够继续执行下一步。

存在的问题

$ wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows http://pt.jikos.cz/garfield/
.
.
2021-08-09 03:28:28 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]

2021-08-09 03:28:30 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
.


Location: https : //packages. debian. org /robots.txt [following]
--2021-08-09 03:28:33--  https : //packages. debian. org /robots.txt
Connecting to packages.debian.org (packages.debian.org)|128.0.10.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24 [text/plain]
Saving to: ‘packages.debian.org/robots.txt’

packages.debian.org 100%[===================>]      24  --.-KB/s    in 0s

2021-08-09 03:28:34 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]

Loading robots.txt; please ignore errors.
--2021-08-09 03:28:34--  http ://wwws. sun. com/ robots.txt
Resolving wwws.sun.com (wwws.sun.com)... 137.254.16.75
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:28:56--  (try: 2)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:29:19--  (try: 3)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:29:43--  (try: 4)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:30:08--  (try: 5)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:30:34--  (try: 6)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:31:01--  (try: 7)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

预期 $wget 能够保存整个网站而不超时,如果超时,则 wget 将跳过超时连接。

答案1

--span-hosts 请阅读有关使用(-H)选项的“风险”以及如何通过添加限制来限制这些风险的详细手册:
https://www.gnu.org/software/wget/manual/wget.html#Spanning-Hosts

--span-hosts或选项-H 打开主机跨接,从而允许 Wget 的递归运行访问链接引用的任何主机。除非应用足够的递归限制标准,否则这些外部主机通常会链接到更多主机,依此类推直到 Wget 最终吸收比您预期更多的数据。

...

限制跨越特定域-D
-D选项允许您指定将跟踪的域,从而将递归仅限制到属于这些域的主机。

...

禁止下载特定域名--exclude-domains
如果有您想要特别排除的域,您可以使用 来完成--exclude-domains,它接受与 相同类型的参数-D,但将排除所有列出的域。

相关内容