我必须通过 URL 模式下载一些文件:https://www.server.org/data/2021/{i}
其中我可以从 1 到 1000 变化。
文件是 pdf 文件,它们都有自己的名称,例如:
https://www.server.org/data/2021/1 ====> document_about_feature.pdf
https://www.server.org/data/2021/2 ====> "activity report.pdf"
https://www.server.org/data/2021/3 ====> "2021-financial analysis from bla bla.pdf"
...
目前我正在使用这些文件wget
如下:
#!/bin/bash
for i in {1..1000}
do
URL="https://www.server.org/data/2021/${i}"
wget -nv --content-disposition "${URL}"
done
这是我发现在下载文件时保留原始文件名的唯一方法(没有标志--content-disposition
,文件被命名为 1.pdf、2.pdf、3.pdf 等)。所以它工作正常,但我想i
通过在文件名后附加前缀(或后缀)来保持文件名之间的链接:
https://www.server.org/data/2021/1 ====> 1_document_about_feature.pdf
https://www.server.org/data/2021/2 ====> "2_activity report.pdf"
https://www.server.org/data/2021/3 ====> "3_2021-financial analysis from bla bla.pdf"
...
我怎样才能正确实现这一点(Ubuntu 22.04.3)?
版本:GNU Wget(1.21.2 构建于 linux-gnu)。
答案1
我可能只是下载然后重命名;就像是
#!/bin/bash
# use this directory as tempfile prefix
# (so that move becomes a single-filesystem operation)
tmpdirname="$(mktemp -d -p --tmpdir . "fetch_XXXXXXX")"
for i in {1..1000}; do
URL="https://www.server.org/data/2021/${i}"
# I'm almost certain you'll want file names 0001,…,0999,1000; not 1…1000
formatted="$(printf '%04d' i)"
dirname="${tmpdirname}/${formatted}"
mkdir "${dirname}"
pushd "${dirname}"
wget -nv --content-disposition "${URL}"
filename=*
popd
mv -- "${dirname}/${filename}" "${formatted} ${filename}"
rmdir "${dirname}"
done
rmdir "${tmpdirname}
就我个人而言,我记得wget
周围有越野车--content-disposition
,至少根据它的man
页面。
我倾向于curl
这些天,所以我会替换wget -nv --content-disposition "${URL}"
为curl --remote-name --remote-header-name
:
#!/bin/bash
# use this directory as tempfile prefix
# (so that move becomes a single-filesystem operation)
tmpdirname="$(mktemp -d -p --tmpdir . "fetch_XXXXXXX")"
for i in {1..1000}; do
URL="https://www.server.org/data/2021/${i}"
# I'm almost certain you'll want file names 0001,…,999,1000; not 1…1000
formatted="$(printf '%04d' i)"
dirname="${tmpdirname}/${formatted}"
mkdir "${dirname}"
pushd "${dirname}"
curl --remote-name --remote-header-name -- "${URL}"
filename=*
popd
mv -- "${dirname}/${filename}" "${formatted} ${filename}"
rmdir "${dirname}"
done
rmdir "${tmpdirname}
1 良好的测试,非常现代的协议支持,与wget
正确重用 TLS 会话不同,可以在较低的服务器负载下实现更快的连接;并且没有逻辑错误--
可以成为有效的论点......