下载包含 robots.txt 的 Tour of Heroes 应用程序和教程网页

下载包含 robots.txt 的 Tour of Heroes 应用程序和教程网页

我喜欢下载《英雄之旅》应用程序和教程网页(https://v13.angular.io/tutorial) 进行离线查看。我在 Linux Mint 20.3 Cinnamon 上尝试了 wget。我怀疑 robots.txt 文件导致了重试问题。请帮忙。谢谢。

$ wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla https://v13.angular.io/tutorial
Both --no-clobber and --convert-links were specified, only --convert-links will be used.
--2022-07-17 08:09:50--  https://v13.angular.io/tutorial
Resolving v13.angular.io (v13.angular.io)... 151.101.65.195, 151.101.1.195
Connecting to v13.angular.io (v13.angular.io)|151.101.65.195|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘v13.angular.io/tutorial.html’

2022-07-17 08:09:50 (1.51 MB/s) - ‘v13.angular.io/tutorial.html’ saved [16907]

--2022-07-17 08:09:50--  https://v13.angular.io/assets/opensearch.xml
Length: 545 [application/xml]
Saving to: ‘v13.angular.io/assets/opensearch.xml’
v13.angular.io/assets/opensearch.xml    100%[===============================================================================>]     545  --.-KB/s    in 0s      

2022-07-17 08:09:50 (18.2 MB/s) - ‘v13.angular.io/assets/opensearch.xml’ saved [545/545]



... snipped for brevity



$ Retrying.
Retrying.: command not found
$ 
$ Resolving www.adjust.com (www.adjust.com)... 178.162.216.219
bash: syntax error near unexpected token `('
$ Connecting to www.adjust.com (www.adjust.com)|178.162.216.219|:443... connected.
bash: syntax error near unexpected token `('
$ HTTP request sent, awaiting response... 200 OK
HTTP: command not found
$ Length: 186 [text/plain]
Length:: command not found
$ Saving to: ‘www.adjust.com/robots.txt’
Saving: command not found
$ 
$ www.adjust.com/robots.txt               100%[===============================================================================>]     186  --.-KB/s    in 0s      
www.adjust.com/robots.txt: line 1: Sitemap:: command not found
www.adjust.com/robots.txt: line 2: User-agent:: command not found
www.adjust.com/robots.txt: line 3: Disallow:: command not found
www.adjust.com/robots.txt: line 4: Disallow:: command not found
www.adjust.com/robots.txt: line 5: Disallow:: command not found
www.adjust.com/robots.txt: line 6: Disallow:: command not found
$ 
$ 2022-07-17 08:33:01 (9.95 MB/s) - ‘www.adjust.com/robots.txt’ saved [186/186]


$ wget --version
GNU Wget 1.20.3 built on linux-gnu.

答案1

此特定robots.txt文件导致wget问题,因为它无法解析其中的链接。这可能只是 中的一个错误wget

这个问题已通过添加到wget的参数 来解决--reject robots.txt,以便它忽略该文件。

然而,并非每个下载的页面都能像在原始环境中一样运行,因此只有较简单的网页才能保证成功。

相关内容