Wget：有选择地递归下载文件？

Question

此命令将仅从给定网站下载图像和电影：

wget -nd -r -P /save/location -A jpeg,jpg,bmp,gif,png,mov "http://www.somedomain.com"

根据威特人:

-nd prevents the creation of a directory hierarchy (i.e. no directories).

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.

如果您想下载子文件夹，您需要使用 flag --no-parent，类似于此命令：

wget -r -l1 --no-parent -P /save/location -A jpeg,jpg,bmp,gif,png,mov "http://www.somedomain.com"

-r: recursive retrieving
-l1: sets the maximum recursion depth to be 1
--no-parent: does not ascend to the parent; only downloads from the specified subdirectory and downwards hierarchy

关于index.html网页。一旦-A命令中包含该标志，它将被排除wget，因为该标志将强制wget下载特定类型的文件，这意味着如果html不包含在接受下载的文件列表中（即标志A），则不会下载该文件并将wget在终端中输出以下消息：

Removing /save/location/default.htm since it should be rejected.

wget可以下载特定类型的文件，例如（jpg、jpeg、png、mov、avi、mpeg...等），当这些文件存在于提供的 URL 链接中时，wget例如：

假设我们想从中下载 .zip 和 .chd 文件网站

此链接中有文件夹和 .zip 文件（滚动到末尾）。现在，假设我们要运行此命令：

wget -r --no-parent -P /save/location -A chd,zip "https://archive.org/download/MAME0.139_MAME2010_Reference_Set_ROMs_CHDs_Samples/roms/"

此命令将下载 .zip 文件，同时将为 .chd 文件创建一个空文件夹。

为了下载 .chd 文件，我们需要提取空文件夹的名称，然后将这些文件夹名称转换为其实际的 URL。然后，将所有感兴趣的 URL 放入一个文本文件中file.txt，最后将该文本文件馈送到中wget，如下所示：

wget -r --no-parent -P /save/location -A chd,zip -i file.txt

前面的命令将查找所有 chd 文件。

Answer 1