我想从受密码保护的网站下载 pdf 文件。为此,我使用wget --auth-no-challenge --http-user="username" --http-password="password" "url_to_pdf"
.显然,它正确连接到服务器,并且当我收到以下响应时下载了文档:
--2022-09-02 13:14:51-- https://moodle.lmu.de/pluginfile.php/1568574/mod_label/intro/ex2_2022.pdf
Resolving moodle.lmu.de (moodle.lmu.de)... 129.187.255.141, 2001:4ca0:0:103::81bb:ff8d
Connecting to moodle.lmu.de (moodle.lmu.de)|129.187.255.141|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://moodle.lmu.de/user/policy.php [following]
--2022-09-02 13:14:52-- https://moodle.lmu.de/user/policy.php
Reusing existing connection to moodle.lmu.de:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘ex2_2022.pdf.1’
ex2_2022.pdf.1 [ <=> ] 75.64K --.-KB/s in 0.1s
2022-09-02 13:14:52 (531 KB/s) - ‘ex2_2022.pdf.1’ saved [77453]
但问题是,当我从 Windows 文件管理器进入该目录并尝试使用 Adobe Acrobat 打开它时,出现以下错误:
我使用 Windows 10、WSL 和 Ubuntu 18.04 LTS
答案1
如果仔细查看 Wget 输出,您可以看到您尝试下载的“PDF”文件是重定向到网页(https://moodle.lmu.de/user/policy.php)。查看输出中的这一行:
Length: unspecified [text/html]
。您正在下载 HTML 文件。难怪 Adobe Acrobat 无法读取它。如果您不相信,请尝试在记事本中查看该文件,您可能会看到 HTML 代码!