是否可以下载网页中链接的所有 .jpg 和 .png 文件?我想下载 [本论坛][1] 中每个主题的每个帖子中包含链接的图片。例如 [本帖子][2] 包含指向 [本文件][3] 的链接。
我已经尝试过使用 wget:
wget -r -np http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?
它会复制该主题的所有 html 文件。虽然我不知道它为什么会从 跳转到...thread?comment=336
,...thread?comment=3232
但它会一个接一个地跳到第 336 条评论。
答案1
尝试用这个命令:
wget -P path/where/save/result -A jpg,png -r http://www.mtgsalvation.com/forums/creativity/artwork/
根据wget 手册页:
-A acclist --accept acclist
Specify comma-separated lists of file name suffixes or patterns to
accept or reject (@pxref{Types of Files} for more details).
-P prefix
Set directory prefix to prefix. The directory prefix is the direc‐
tory where all other files and subdirectories will be saved to,
i.e. the top of the retrieval tree. The default is . (the current
directory).
-r
--recursive
Turn on recursive retrieving.
尝试这个:
mkdir wgetDir
wget -P wgetDir http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=145
此命令将获取 html 页面并将其放入wgetDir
。当我尝试此命令时,我发现了这个文件:
340782-official-digital-rendering-thread?page=145
然后,我尝试了这个命令:
wget -P wgetDir -A png,jpg,jpeg,gif -nd --force-html -r -i "wgetDir/340782-official-digital-rendering-thread?page=145"
它会下载图片。所以,它似乎可以工作,尽管我不知道这些图片是否是你想要下载的。
答案2
#include <stdio.h>
#include <stdlib.h> // for using system calls
#include <unistd.h> // for sleep
int main ()
{
char body[] = "forum-post-body-content", notes[] = "p-comment-notes", img[] = "img src=", link[200], cmd[200]={0}, file[10];
int c, pos = 0, pos2 = 0, fin = 0, i, j, num = 0, found = 0;
FILE *fp;
for (i = 1; i <= 149; ++i)
{
sprintf(cmd,"wget -O page%d.txt 'http://www.mtgsalvation.com/forums/creativity/artwork/340782-official-digital-rendering-thread?page=%d'",i,i);
system(cmd);
sprintf(file, "page%d.txt", i);
fp = fopen (file, "r");
while ((c = fgetc(fp)) != EOF)
{
if (body[pos] == c)
{
if (pos == 22)
{
pos = 0;
while (fin == 0)
{
c = fgetc (fp);
if (feof (fp))
break;
if (notes[pos] == c)
{
if (pos == 14)
{
fin = 1;
pos = -1;
}
++pos;
}
else
{
if(pos > 0)
pos = 0;
}
if (img[pos2] == c)
{
if (pos2 == 7)
{
pos2 = 0;
while (found == 0)
{
c = fgetc (fp); // get char from file
link[pos2] = c;
if (pos2 > 0)
{
if(link[pos2-1] == 'g' && link[pos2] == '\"')
{
found = 1;
}
}
++pos2;
}
--pos2;
found = 0;
char link2[pos2];
for (j = 1; j < pos2; ++j)
{
link2[j - 1] = link[j];
}
link2[j - 1] = '\0';
sprintf(cmd, "wget -O /home/arturo/Dropbox/Digital_Renders/%d \'%s\'", ++num, link2);
system(cmd);
pos2 = -1;
}
++pos2;
}
else
{
if(pos2 > 0)
pos2 = 0;
}
}
fin = 0;
}
++pos;
}
else
pos = 0;
}
// closing file
fclose (fp);
if (remove (file))
fprintf(stderr, "Can't remove file\n");
}
}