如何从 URL 下载特定文件夹

Question 1

您可以使用-i标志，它将读取要从文件下载的 URL 列表。因此，如果您有包含以下内容的 file.txt：

https://physionet.org/physiobank/database/challenge/2018/training/tr03-0005                                                      
https://physionet.org/physiobank/database/challenge/2018/training/tr03-0029

然后使用wget other_options -i file.txt，您将只下载这两个文件夹，即前两个文件夹。请注意，-i您不需要传递 URL，因为每个 URL 都将从文件中读取。

Answer

您可以使用-i标志，它将读取要从文件下载的 URL 列表。因此，如果您有包含以下内容的 file.txt：

https://physionet.org/physiobank/database/challenge/2018/training/tr03-0005                                                      
https://physionet.org/physiobank/database/challenge/2018/training/tr03-0029

然后使用wget other_options -i file.txt，您将只下载这两个文件夹，即前两个文件夹。请注意，-i您不需要传递 URL，因为每个 URL 都将从文件中读取。

Question 2

阅读man wget，你会发现

   -X list
   --exclude-directories=list
       Specify a comma-separated list of directories you wish to exclude from download.
       Elements of list may contain wildcards.

Answer

阅读man wget，你会发现

   -X list
   --exclude-directories=list
       Specify a comma-separated list of directories you wish to exclude from download.
       Elements of list may contain wildcards.

Question 3

Wget 没有任何可以解释这种方式的限制（我记得或找到的）。

但是对于这种特定情况，您可以使用 shell 简单地解析出限制内的子目录并单独获取它们：

# Where `n` is the limit we want
n=50; c=0; for f in $(curl https://physionet.org/physiobank/database/challenge/2018/training/ | grep '^<a href="tr' | sed 's/.*"\(.*\)".*/\1/'); do if [ $c -ge $n ]; then break; fi; wget -r -np -nH --cut-dirs=5 -R "index.html*,.mat" -e robots=off "https://physionet.org/physiobank/database/challenge/2018/training/${f}"; c=$(($c + 1)) ; done

Answer

Wget 没有任何可以解释这种方式的限制（我记得或找到的）。

但是对于这种特定情况，您可以使用 shell 简单地解析出限制内的子目录并单独获取它们：

# Where `n` is the limit we want
n=50; c=0; for f in $(curl https://physionet.org/physiobank/database/challenge/2018/training/ | grep '^<a href="tr' | sed 's/.*"\(.*\)".*/\1/'); do if [ $c -ge $n ]; then break; fi; wget -r -np -nH --cut-dirs=5 -R "index.html*,.mat" -e robots=off "https://physionet.org/physiobank/database/challenge/2018/training/${f}"; c=$(($c + 1)) ; done

如何从 URL 下载特定文件夹

答案1

答案2

答案3

相关内容