尝试从 URL 下载 zip 文件

尝试从 URL 下载 zip 文件

我在使用 wget 和 curl 从服务器 URL 下载 zip 文件时遇到了一些困难。它们都没有按预期工作。例如:

wget  [email protected] --password=secret!  "https://msi.domain.com/admin/ui/feedbackCSV/organizationReports/sjf3j45345345bsdf?reports[]=pageviews& reports[]=searches&from=Nov 12 2020&to=Nov 19 2020/client_Technical_Publications_stats.zip"

我收到了 400 响应,这是一个糟糕的请求错误,所以我认为也许我需要对 URL 进行编码。

我尝试过:

wget  [email protected] --password=secret! 
"https%3A%2F%2Fmsi.domain.com%2Fadmin%2Fui%2FfeedbackCSV%2ForganizationReports%2Fsjf3j45345345bsdf%3Freports%5B%5D%3Dpageviews%26%20reports%5B%5D%3Dsearches%26from%3DNov%2012%202020%26to%3DNov%2019%202020%2Fclient_Technical_Publications_stats.zip"

几秒钟后,它会创建一个文件名2Fsjf3j45345345bsdf?reports[]=pageviews& reports[]=searches&from=Nov 12 2020&to=Nov 19 2020%2Fclient_Technical_Publications_stats.zip,当我打开该文件时,我注意到其中有一堆 HTML。

如果将我的编码 URL 复制并粘贴到浏览器中,我就可以下载实际的 zip 文件。

I used the chrome developer tool to find the headers in the request  and I find below headers 
**Request  headers**
:authority: msi.domain.com
:method: GET
:path: /admin/ui/feedbackCSV/organizationReports/2Fsjf3j45345345bsdf?reports%5B%5D=pageviews&%20reports%5B%5D=searches&from=Nov%2012%202020&to=Nov%2019%202020/
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: tdid=azJ6MW45bSUyQlRxMXhnMWlRdFlJbDJRJTNEJTNEOklyMk14M2tYVVpvSG9HUTRtJTJGZlFKdyUzRCUzRA; _ga=GA1.2.1783666257.1605728974; JSESSIONID=9C50C09FEE87F3CCF7701FC7C3F0F326; AWSALB=v4wU9BVN7zdWf0YrbhfTrsTRGXyV0x5VtFVhxHDMco7vIWs8SfIDrU9db00EbaakDwmEdE2pXltZSswTiEF/K069JdH6vMr4RvNYYpsSbsPUTVuUt/5NkLHTJEJd; AWSALBCORS=v4wU9BVN7zdWf0YrbhfTrsTRGXyV0x5VtFVhxHDMco7vIWs8SfIDrU9db00EbaakDwmEdE2pXltZSswTiEF/K069JdH6vMr4RvNYYpsSbsPUTVuUt/5NkLHTJEJd
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36

响应标头

cache-control: no-cache, no-store, max-age=0, must-revalidate
content-disposition: attachment; filename=client_Technical_Publications_stats.zip
content-security-policy: default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval' data: blob:; connect-src 'self'; frame-src 'self'; frame-ancestors 'self'; img-src 'self' data: blob:; style-src 'self' 'unsafe-inline'; font-src 'self';
content-type: application/zip
date: Wed, 23 Dec 2020 17:22:55 GMT
expires: 0
feature-policy: geolocation 'self';midi 'none';sync-xhr 'self';microphone 'none';camera 'none';magnetometer 'none';gyroscope 'none';speaker 'self';fullscreen 'self';payment 'none';
pragma: no-cache
referrer-policy: no-referrer-when-downgrade
server: nginx/1.17.2
set-cookie: AWSALB=rYgsUIfM2f/cfdgdfgrf2SxsgFRrq58s0ChVFPOR7/zBzYdwb4/cRZYggtSXybifpD/J/0mBxH5kUIwVoDboy+KM8C3wN8o0HjUGCAjBg9qVIv2XA/r; Expires=Wed, 30 Dec 2020 17:22:55 GMT; Path=/
set-cookie: AWSALBCORS=rYgsUIfM2f/cQc9w1vbcvcvblQrf2SxsgFRrq58s0ChVFPOR7/zBzYdwb4/cRZYggtSXybifpD/J/0mBxH5kUIwVoDboy+KM8C3wN8o0HjUGCAjBg9qVIv2XA/r; Expires=Wed, 30 Dec 2020 17:22:55 GMT; Path=/; SameSite=None; Secure
status: 200
strict-transport-security: max-age=31536000; includeSubDomains
td-service: admin
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block

命令

wget --header='Accept-Encoding: gzip, deflate,br' \
     --header='content-type: application/zip'\
     --header='Accept-Language: en-US,en;q=0.9'\
     --header='sec-fetch-mode: navigate'\
     --header='upgrade-insecure-requests: 1'\
     --header='sec-fetch-dest:document'\
     --header='sec-fetch-mode:navigate'\
     --header='scheme:https' \
     --user-agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36' \
     [email protected] \
     --password=secret! \
     -O ctp_stats.zip \
     "https://msi.domain.com/admin/ui/feedbackCSV/organizationReports/2Fsjf3j45345345bsdf?reports%5B%5D=pageviews&%20reports%5B%5D=searches&from=Nov%2012%202020&to=Nov%2019%202020/" 

我在我的命令中传递了相同的标题,我得到了 200 个响应代码和以下输出长度:未指定 [text/html] 保存到:“ctp_stats.zip”当我打开 zip 文件夹时,我看到一个空文件

答案1

鉴于您正在将凭据传递给生成此报告的 Web 服务器,因此您可能还需要传递一些其他请求标头信息,例如:

wget --header='Content-Type: text/plain' \
     --header='Accept-Encoding: gzip, deflate' \
     --user-agent='User-Agent: Mozilla/5.0 (Windows; MSIE 5; Windows ME; en-US)' \
     [email protected] \
     --password=secret! \
     -O ctp_stats.zip \
     "https://msi.domain.com/admin/ui/feedbackCSV/organizationReports/sjf3j45345345bsdf?reports[]=pageviews&reports[]=searches&from=Nov 12 2020&to=Nov 19 2020/client_Technical_Publications_stats.zip"

您可能希望使用您最喜欢的浏览器中的开发人员工具来查看请求此文件时发送的 HTTP 标头,以便您可以填写适当的数据。

有关所有不同类型的标头的更多信息,wget请参见在手册的这一页上

虽然这不是问题的一部分,但其中还有一个-O选项可以将文件输出到,ctp_stats.zip这样您就不会得到一个非常长的名称。如果它没有任何价值,请随意使用/修改/删除它。

相关内容