我想存档留言板,我通过使用wget和参数来做到这一点:--page-requisites
、--span-hosts
和--convert-links
。--no-clobber
问题是使用--convert-links
禁用--no-clobber
.对于每个主题页面,wget 都会重新下载站点皮肤、脚本和图标(以保持更新)。
有没有办法阻止 wget 下载本地已存在的文件,将文件链接引用到其本地副本,并且仅下载文件系统中尚未存在的文件?
答案1
我相信,如果您包含该开关,-N
它将强制wget
使用时间戳。
-N
--timestamping
Turn on time-stamping.
使用此开关,wget
将仅下载本地尚未存在的文件。
例子
robots.txt
下载本地尚不存在的文件。
$ wget -N http://google.com/robots.txt
--2014-06-15 21:18:16-- http://google.com/robots.txt
Resolving google.com (google.com)... 173.194.41.9, 173.194.41.14, 173.194.41.0, ...
Connecting to google.com (google.com)|173.194.41.9|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/robots.txt [following]
--2014-06-15 21:18:17-- http://www.google.com/robots.txt
Resolving www.google.com (www.google.com)... 173.194.46.83, 173.194.46.84, 173.194.46.80, ...
Connecting to www.google.com (www.google.com)|173.194.46.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘robots.txt’
[ <=> ] 7,608 --.-K/s in 0s
2014-06-15 21:18:17 (359 MB/s) - ‘robots.txt’ saved [7608]
使用本地文件再次尝试robots.txt
:
$ wget -N http://google.com/robots.txt
--2014-06-15 21:18:19-- http://google.com/robots.txt
Resolving google.com (google.com)... 173.194.41.8, 173.194.41.9, 173.194.41.14, ...
Connecting to google.com (google.com)|173.194.41.8|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.google.com/robots.txt [following]
--2014-06-15 21:18:19-- http://www.google.com/robots.txt
Resolving www.google.com (www.google.com)... 173.194.46.82, 173.194.46.83, 173.194.46.84, ...
Connecting to www.google.com (www.google.com)|173.194.46.82|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Server file no newer than local file ‘robots.txt’ -- not retrieving.
请注意,第二次时,wget
没有再次检索文件。