我在基于 Ubuntu 20.04 的 LEMP 服务器上运行 wordpress 网站。我启用了 pagespeed 插件,为了强制它缓存我的网站,我使用另一个盒子中的 wget 来镜像该网站。但是,当从第二个盒子使用 wget 时,它会在第一页(index.html)停止下载,并显示错误
/tmp/ramdisk/www.example.com/index.html 中发现 nofollow 属性。不会跟踪此页面上的任何链接以下是我正在使用的 wget 命令和返回结果:
wget -m -p -E -k -P /tmp/ramdisk/ https://www.example.com
--2022-05-17 16:41:40-- https://www.example.com/
Resolving www.example.com (www.example.com)... 1**.2*.1**.*
Connecting to www.example.com (www.example.com)|1**.2*.1**.*|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘/tmp/ramdisk/www.example.com/index.html’
www.example.com/index.html [ <=> ] 130.71K 210KB/s in 0.6s
Last-modified header missing -- time-stamps turned off.
2022-05-17 16:41:42 (210 KB/s) - ‘/tmp/ramdisk/www.example.com/index.html’ saved [133848]
nofollow attribute found in /tmp/ramdisk/www.example.com/index.html. Will not follow any links on this page
FINISHED --2022-05-17 16:41:42--
Total wall clock time: 2.0s
Downloaded: 1 files, 131K in 0.6s (210 KB/s)
Converting links in /tmp/ramdisk/www.example.com/index.html... 135.
42-93
Converted links in 1 files in 0.004 seconds.
我怎样才能找到 nofollow 属性并将其删除,以便 wget 可以完全下载我的网站?
答案1
作为记录在这里你可以通过添加参数告诉 wget 忽略 no-follow 属性-e robots=off