为什么我的爬虫在 Ubuntu Server 上运行结果不同?

为什么我的爬虫在 Ubuntu Server 上运行结果不同?

我正在编写一些代码来从通过 javascript 嵌入的 yahoo 主页图像滑块窗口获取 URL,因此我使用 Selenium 来模拟人类的点击动作。代码和详细信息说明附在 python 文件中。请帮我弄清楚。

在此处输入图片描述

'''Since Ubuntu do not have physical FireFox Browser, so I use
xvfbwrapper to create the environment,
I only add it on the server site. Local testing, I did not add it. 
'''
#from xvfbwrapper import Xvfb
#with Xvfb() as xvfb:

from selenium import webdriver
import selenium.webdriver.support.ui as ui

url_path="https://sg.yahoo.com"
driver = webdriver.Firefox()
driver.get(url_path)
#wait until the Crawler find the Carousel-Ranges Class which is the image slider (it contains all the urls which i need)
first_result = ui.WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_class_name('Carousel-Ranges'))
#get the number of url to implement the for loop
imageRange = first_result.text
numString = imageRange.split("of ")
numInt = int(numString[1])
print numInt #print total number of URL that extracted through text

for x in range(0,((numInt-10)/5)+1):
    #simulate the click action. every single click, the javascript will enable 5 more urls so I can use "li.Cur-p a"
    #to get, and by default there are only 10 urls which are enabled.
    driver.find_element_by_css_selector("button[class*='End-0 T-0 B-0 Fz-30 Z-1']").click()
topNews = driver.find_elements_by_css_selector("li.Cur-p a");
#print out the urls list that cralwed. If it matches with previous number
# then it means I successfully get the result.
print len(topNews)

但是此代码仅适用于本地测试。当我添加 xvfbwrapper 并将其放入服务器时,它不起作用。举个例子,如果总 URL 为 60,爬虫在本地运行并返回相同的结果(60-成功);但当服务器运行时,它只返回 10,即默认 URL。因此,我推断问题可能是由这个 xvfbwrapper 引起的。但我不知道如何解决它。有人可以解决这个问题吗?再次感谢

相关内容