使用 wget 递归下载但不是单个文件时出现 403 禁止错误

使用 wget 递归下载但不是单个文件时出现 403 禁止错误

我正在尝试使用递归 wget 命令下载目录

wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/   

这适用于某些文件,但也会输出一系列 403 Forbidden 错误,例如

--2023-06-13 08:43:51--  https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Reusing existing connection to data.darts.isas.jaxa.jp:443.
HTTP request sent, awaiting response... 403 Forbidden
2023-06-13 08:43:51 ERROR 403: Forbidden.

但是,如果我尝试单独下载这些文件,它就可以工作

wget -m -nH --cut-dirs=5 https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl

--2023-06-13 09:06:44--  https://data.darts.isas.jaxa.jp/pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200710_S_V03.lbl
Resolving data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)... 133.74.198.108
Connecting to data.darts.isas.jaxa.jp (data.darts.isas.jaxa.jp)|133.74.198.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1382 (1.3K)
Saving to: ‘ck/SEL_M_200710_S_V03.lbl’

ck/SEL_M_200710_S_V03.lb 100%[================================>]   1.35K  --.-KB/s    in 0s      

2023-06-13 09:06:44 (18.3 MB/s) - ‘ck/SEL_M_200710_S_V03.lbl’ saved [1382/1382]

FINISHED --2023-06-13 09:06:44--
Total wall clock time: 0.7s
Downloaded: 1 files, 1.3K in 0s (18.3 MB/s)

我努力了:

  • -e robots=off
  • --user-agent=Mozilla/5.0
  • --trust-server-names
  • 通过 Chrome 开发者工具查看单个文件的请求标头。没有我可以识别的 cookie 和引用者。
GET /pub/pds3/sln-l-spice-6-v1.0/slnsp_1000/data/ck/SEL_M_200711_D_V03.BC HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Connection: keep-alive
Host: data.darts.isas.jaxa.jp
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
sec-ch-ua-mobile: ?0

顺便说一句,这些 URL 来自数据档案和传输系统 (DARTS),该系统存档 JAXA(日本宇宙航空研究开发机构)空间科学任务获得的高级数据产品。它用于公开下载这些数据产品,并且不应该有任何身份验证要求。

使用的资源

相关内容