我喜欢下载《英雄之旅》应用程序和教程网页(https://v13.angular.io/tutorial) 进行离线查看。我在 Linux Mint 20.3 Cinnamon 上尝试了 wget。我怀疑 robots.txt 文件导致了重试问题。请帮忙。谢谢。
$ wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla https://v13.angular.io/tutorial
Both --no-clobber and --convert-links were specified, only --convert-links will be used.
--2022-07-17 08:09:50-- https://v13.angular.io/tutorial
Resolving v13.angular.io (v13.angular.io)... 151.101.65.195, 151.101.1.195
Connecting to v13.angular.io (v13.angular.io)|151.101.65.195|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘v13.angular.io/tutorial.html’
2022-07-17 08:09:50 (1.51 MB/s) - ‘v13.angular.io/tutorial.html’ saved [16907]
--2022-07-17 08:09:50-- https://v13.angular.io/assets/opensearch.xml
Length: 545 [application/xml]
Saving to: ‘v13.angular.io/assets/opensearch.xml’
v13.angular.io/assets/opensearch.xml 100%[===============================================================================>] 545 --.-KB/s in 0s
2022-07-17 08:09:50 (18.2 MB/s) - ‘v13.angular.io/assets/opensearch.xml’ saved [545/545]
... snipped for brevity
$ Retrying.
Retrying.: command not found
$
$ Resolving www.adjust.com (www.adjust.com)... 178.162.216.219
bash: syntax error near unexpected token `('
$ Connecting to www.adjust.com (www.adjust.com)|178.162.216.219|:443... connected.
bash: syntax error near unexpected token `('
$ HTTP request sent, awaiting response... 200 OK
HTTP: command not found
$ Length: 186 [text/plain]
Length:: command not found
$ Saving to: ‘www.adjust.com/robots.txt’
Saving: command not found
$
$ www.adjust.com/robots.txt 100%[===============================================================================>] 186 --.-KB/s in 0s
www.adjust.com/robots.txt: line 1: Sitemap:: command not found
www.adjust.com/robots.txt: line 2: User-agent:: command not found
www.adjust.com/robots.txt: line 3: Disallow:: command not found
www.adjust.com/robots.txt: line 4: Disallow:: command not found
www.adjust.com/robots.txt: line 5: Disallow:: command not found
www.adjust.com/robots.txt: line 6: Disallow:: command not found
$
$ 2022-07-17 08:33:01 (9.95 MB/s) - ‘www.adjust.com/robots.txt’ saved [186/186]
$ wget --version
GNU Wget 1.20.3 built on linux-gnu.
答案1
此特定robots.txt
文件导致wget
问题,因为它无法解析其中的链接。这可能只是 中的一个错误wget
。
这个问题已通过添加到wget
的参数 来解决--reject robots.txt
,以便它忽略该文件。
然而,并非每个下载的页面都能像在原始环境中一样运行,因此只有较简单的网页才能保证成功。