如何使用 wget 根据 mime 类型获取文件

如何使用 wget 根据 mime 类型获取文件

有些 URL 是这样的:

/foo/bar

其中,他们没有像这样的扩展:

/foo/bar.txt

如果有扩展就很容易了:

wget -r -A .txt http://asdf.com

但如果没有,那么我不确定如何获取文件。基本上,有些文件(如 PDF 或其他文件)位于/0du8qj8quqjc9没有扩展名的路径中,甚至可能是/download.php?pdf=124u0cje8u。问题是如何下载这些文件仅当它与 mime-type 匹配时.例如:

wget -r --accept-mime text/plain,application/pdf http://asdf.com

想知道是否有类似的事情可以做。

答案1

Wget2 已经有这个功能了 :-)

--filter-mime-type    Specify a list of mime types to be saved or ignored`

### `--filter-mime-type=list`

Specify a comma-separated list of MIME types that will be downloaded.  Elements of list may contain wildcards.
If a MIME type starts with the character '!' it won't be downloaded, this is useful when trying to download
something with exceptions. For example, download everything except images:

  wget2 -r https://<site>/<document> --filter-mime-type=*,\!image/*

It is also useful to download files that are compatible with an application of your system. For instance,
download every file that is compatible with LibreOffice Writer from a website using the recursive mode:

  wget2 -r https://<site>/<document> --filter-mime-type=$(sed -r '/^MimeType=/!d;s/^MimeType=//;s/;/,/g' /usr/share/applications/libreoffice-writer.desktop)

截至今日,Wget2 尚未发布,但很快就会发布。Debian stable 已经发布了 alpha 版本。

看着https://gitlab.com/gnuwget/wget2了解更多信息。您可以直接将问题/评论发布到[电子邮件保护]

相关内容