下载网页内容

Question 1

如果我理解了你的问题，那么以下脚本应该是你想要的：

#!/usr/bin/env python

import urllib
import re
import sys
import os
page = urllib.urlopen("http://www.adobe.com/support/security/")
page = page.read()
fileHandle = open('content', 'w')
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
for link in links:
    sys.stdout = fileHandle
    print ('%s' % (link[0]))
sys.stdout = sys.__stdout__
fileHandle.close() 
os.system("grep -i '\/support\/security\/bulletins\/' content 2>/dev/null | head -n 3 | uniq | sed -e 's/^/http:\/\/www.adobe.com/g' > content1")
os.system("wget -i content1")

Answer

如果我理解了你的问题，那么以下脚本应该是你想要的：

#!/usr/bin/env python

import urllib
import re
import sys
import os
page = urllib.urlopen("http://www.adobe.com/support/security/")
page = page.read()
fileHandle = open('content', 'w')
links = re.findall(r"<a.*?\s*href=\"(.*?)\".*?>(.*?)</a>", page)
for link in links:
    sys.stdout = fileHandle
    print ('%s' % (link[0]))
sys.stdout = sys.__stdout__
fileHandle.close() 
os.system("grep -i '\/support\/security\/bulletins\/' content 2>/dev/null | head -n 3 | uniq | sed -e 's/^/http:\/\/www.adobe.com/g' > content1")
os.system("wget -i content1")

Question 2

这个问题可能与 stackoverflow 有关！

但无论如何你可以看看HT轨道为此，它执行类似的操作，而且它是开源的

Answer

这个问题可能与 stackoverflow 有关！

但无论如何你可以看看HT轨道为此，它执行类似的操作，而且它是开源的

下载网页内容

答案1

答案2

相关内容