如何使用 php 链接(动态 URL)下载数据文件并使用 wget 登录

如何使用 php 链接(动态 URL)下载数据文件并使用 wget 登录

当我从 Human 3.6 M 数据库的链接下载时,出现以下错误。

user@ubuntu:/disk1/user/Human_3.6m_data$ bash download0.sh
user@ubuntu:/disk1/user/Human_3.6m_data$ --2017-12-18 23:52:10--  http://vision.imar.ro/human3.6m/filebrowser.php?download=1
Connecting to [ip address]... connected.
Proxy request sent, awaiting response... 302 Found
Location: main_login.php [following]
--2017-12-18 23:52:11--  http://vision.imar.ro/human3.6m/main_login.php
Reusing existing connection to [ip address].
Proxy request sent, awaiting response... 302 Found
Location: https://vision.imar.ro/human3.6m/main_login.php [following]
--2017-12-18 23:52:11--  https://vision.imar.ro/human3.6m/main_login.php
Connecting to [ip address]... connected.
WARNING: cannot verify vision.imar.ro's certificate, issued by ‘emailAddress=root@vision,CN=vision,OU=SomeOrganizationalUnit,O=SomeOrganization,L=SomeCity,ST=SomeState,C=--’:
   Self-signed certificate encountered.
     WARNING: certificate common name ‘vision’ doesn't match requested host name ‘vision.imar.ro’.
Proxy request sent, awaiting response... 200 OK
Length: 2600 (2.5K) [text/html]
Saving to: ‘filebrowser.php?download=1.1’

filebrowser.php?download=1.1                    100%[====================================================================================================>]   2.54K  --.-KB/s    in 0s

2017-12-18 23:52:13 (74.0 MB/s) - ‘filebrowser.php?download=1.1’ saved [2600/2600]

下载链接为

http://vision.imar.ro/human3.6m/filebrowser.php?download=1&filepath=Videos&filename=ActivitySpecific_1.tgz&downloadname=Directions

我使用了这些 Linux 命令:

wget --no-check-certificate --user usr --password pswdhttp://vision.imar.ro/human3.6m/filebrowser.php?download=1&filepath=Videos&filename=ActivitySpecific_1.tgz&downloadname=Directions

wget --no-check-certificate --trust-server-names --user usr --password pswd -O 说明http://vision.imar.ro/human3.6m/filebrowser.php?download=1&filepath=Videos&filename=ActivitySpecific_1.tgz&downloadname=Directions

实际数据大小为 6 GB。

答案1

首先你必须引用网址:

wget --no-check-certificate --user usr --password pswd \
'http://vision.imar.ro/human3.6m/filebrowser.php?download=1&filepath=Videos&filename=ActivitySpecific_1.tgz&downloadname=Directions'

否则,第一个之后的所有内容&都会被 shell 截断,你还会看到以下内容:

[1] 20618
[2] 20619
[1]-  Done                    filepath=Videos
$ 
[2]+  Done                    filename=ActivitySpecific_1.tgz

第二,小文件大概是这样的。

在此处输入图片描述

这些--user--password选项通常不起作用。对于基于 cookie 的登录,您需要执行如下操作(改编自man wget/、 的信息--post):

wget --no-check-certificate --keep-session-cookies --save-cookies cookies.txt \
--post-data 'username=foo&password=bar' \
'https://vision.imar.ro/human3.6m/checklogin.php'

# Now grab the page or pages we care about.
wget --no-check-certificate --load-cookies cookies.txt \
'https://vision.imar.ro/human3.6m/filebrowser.php?download=1&filepath=Videos&filename=ActivitySpecific_1.tgz&downloadname=Directions'

相关内容