有些 URL 是这样的:
/foo/bar
其中,他们没有像这样的扩展:
/foo/bar.txt
如果有扩展就很容易了:
wget -r -A .txt http://asdf.com
但如果没有,那么我不确定如何获取文件。基本上,有些文件(如 PDF 或其他文件)位于/0du8qj8quqjc9
没有扩展名的路径中,甚至可能是/download.php?pdf=124u0cje8u
。问题是如何下载这些文件仅当它与 mime-type 匹配时.例如:
wget -r --accept-mime text/plain,application/pdf http://asdf.com
想知道是否有类似的事情可以做。
答案1
Wget2 已经有这个功能了 :-)
--filter-mime-type Specify a list of mime types to be saved or ignored`
### `--filter-mime-type=list`
Specify a comma-separated list of MIME types that will be downloaded. Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:
wget2 -r https://<site>/<document> --filter-mime-type=*,\!image/*
It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:
wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)
截至今日,Wget2 尚未发布,但很快就会发布。Debian stable 已经发布了 alpha 版本。
看着https://gitlab.com/gnuwget/wget2了解更多信息。您可以直接将问题/评论发布到[电子邮件保护]。