有没有办法更新已复制以供在终端中离线查看的网站?我下载了 everquest.allakhazam.com,只是很好奇,因为它会定期更新。我不想多次经历整个下载过程,因为这需要一段时间。
此外,我对任何类型的 Linux 都很不熟悉,在终端方面也不是非常有经验。所以请温柔一点。XD
提前致谢!
答案1
wget -N http://www.yoururl.com/
其中 www.yoururl.com 是您要重新访问的 URL,这应该可以很好地完成此操作。交换机-N
将向服务器询问上次修改的日期。如果本地文件较新,则不会重新获取远程文件。但是,如果远程文件较新,wget
将继续正常获取它。请注意,您需要在最初启动 wget 的同一目录中启动它。
关于限制的说明引自man wget
:
If a file is downloaded more than once in the same directory,
Wget's behavior depends on a few options, including -nc. In
certain cases, the local file will be clobbered, or overwritten,
upon repeated download. In other cases it will be preserved.
When running Wget without -N, -nc, -r, or -p, downloading the same
file in the same directory will result in the original copy of file
being preserved and the second copy being named file.1. If that
file is downloaded yet again, the third copy will be named file.2,
and so on. (This is also the behavior with -nd, even if -r or -p
are in effect.) When -nc is specified, this behavior is
suppressed, and Wget will refuse to download newer copies of file.
Therefore, ""no-clobber"" is actually a misnomer in this
mode---it's not clobbering that's prevented (as the numeric
suffixes were already preventing clobbering), but rather the
multiple version saving that's prevented.
根据您的情况,您可能还需要-r
(递归) 和 -l (级别深度) 开关。有关可用开关和选项的更多信息,请参阅man wget
如果 wget 对你不起作用:
提到的替代方案这里它还wget
可以httrack
镜像网站并对其进行更新。
追踪最快可到启用 Universe 存储库然后通过软件中心或命令行使用以下命令进行安装sudo apt-get update && sudo apt-get install httrack
来源 wget:
https://superuser.com/questions/283481/how-do-i-properly-set-wget-to-download-only-new-files
man wget
http://www.editcorp.com/Personal/Lars_Appel/wget/wget_5.html
来源httrack:
答案2
从这里我必须使用wget -N site.com
。但是听起来您需要使用下载网站来wget -S site.com
检查上次修改日期。然后 -N 检查上次修改日期,如果它比“旧”版本更新,它会更新文件。
答案3
wget
--timestamping
使用选项(又名)支持此功能-N
。它将下载文件的修改时间设置为Last-Modified
HTTP 标头。
当您尝试再次下载文件时,它将发送一个If-Not-Modified-Since
标头,服务器可能会用该标头进行响应304 Not Modified
。
如果你尝试这样做http://www.jasny.net, 你看
$ wget --timestamping http://www.jasny.net
--2017-04-06 22:56:37-- http://www.jasny.net/
Resolving www.jasny.net (www.jasny.net)... 151.101.36.133
Connecting to www.jasny.net (www.jasny.net)|151.101.36.133|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18433 (18K) [text/html]
Saving to: ‘index.html’
index.html
2017-04-06 22:56:37 (1,15 MB/s) - ‘index.html’ saved [18433/18433]
比第二次
$ wget --timestamping http://www.jasny.net
--2017-04-06 22:56:38-- http://www.jasny.net/
Resolving www.jasny.net (www.jasny.net)... 151.101.36.133
Connecting to www.jasny.net (www.jasny.net)|151.101.36.133|:80... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘index.html’ not modified on server. Omitting download.
不幸的是,everquest.allakhazam.com 不发送Last-Modified
标头。因此使用--timestamping
无效。而且服务器不响应If-Not-Modified-Since
标头。
如果没有服务器支持此功能,那么每次只能下载整个网站,别无选择。