我正在尝试弄清楚如何保存包含所有相关文件的网页,例如:http://docs.oasis-open.org/ubl/os-UBL-2.0/xsd/
我想保存目录中的所有文件,有点像爬虫,但更受限制,如果可能的话,在 Firefox 中
答案1
奇怪的是,答案不知为何被删除了。
答案如下:
或者
看https://www.gnu.org/software/wget/manual/html_node/Directory_002dBased-Limits.html
‘-np’ ‘--no-parent’ ‘no_parent = on’
The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above
比起始目录更高,即不允许上升到父目录/目录。
The ‘--no-parent’ option (short ‘-np’) is useful in this case. Using it guarantees that you will never leave the existing hierarchy.
假设你发出 Wget:
wget -r --no-parent http://somehost/~luzer/my-archive/ You may rest assured that none of the references to /~his-girls-homepage/ or /~luzer/all-my-mpegs/ will be followed. Only
您感兴趣的档案将被下载。本质上,“--no-parent”类似于“-I/~luzer/my-archive”,只是它以更智能的方式处理重定向。
Note that, for HTTP (and HTTPS), the trailing slash is very important to ‘--no-parent’. HTTP has no concept of a “directory”—Wget
依靠你来指示什么是目录,什么不是目录。在'http://foo/bar/',Wget 会将 'bar' 视为目录,而在 'http://foo/bar'(没有尾随斜杠),'bar' 将被视为文件名(因此 '--no-parent' 毫无意义,因为它的父级是 '/')。