我有一个简单的命令来获取登录页面及其所有依赖项:
wget --post-data='user=user&password=password' --page-requisites https://…/login
服务器日志显示以下内容(出于明显原因而缩写):
- 发布/登录 302
- 获取/账户200
- POST /robots.txt 200(应该是GET,但它成功了,所以没问题)
- POST /favicon.ico 200(同上)
- POST /[looong PageSpeed URL]500(对于页面上的每个 CSS、JavaScript 和图像文件)
获取这些文件工作正常,因此 URL 是正确的,但 PageSpeed 似乎不喜欢客户端 POSTing。如何将wget
GET 用于除初始请求之外的所有内容?
使用 GNU Wget 1.18。
更新:漏洞已提交。
答案1
来自“man wget”:
This example shows how to log in to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized
users:
# Log in to the server. This can be done only once.
wget --save-cookies cookies.txt \
--post-data 'user=foo&password=bar' \
http://example.com/auth.php
# Now grab the page or pages we care about.
wget --load-cookies cookies.txt \
-p http://example.com/interesting/article.php
If the server is using session cookies to track user authentication, the above will not work because --save-cookies will not save them (and neither
will browsers) and the cookies.txt file will be empty. In that case use --keep-session-cookies along with --save-cookies to force saving of session
cookies.