如何在虚拟机上运行 scrapy splash

2024-10-3 • tag-icon

如何在 Linux 虚拟机上运行 scrapy splash？本质上，我有一个 lua 脚本，需要我将密钥发送到网站以登录，然后对其进行抓取。

我已经安装了docker，但是我似乎无法让抓取工具工作，因为它无法连接到服务器。

是否有任何简单的步骤可以让我遵循以使其在 VM 上运行？例如我应该安装什么，以及在运行之前我应该做什么scrapy crawl spider。

至于docker，我在管理员模式下实现了以下内容：

docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600

但是它目前正在运行，我希望它在后台运行。我似乎无法弄清楚；我试过：

docker run -d 8050:8050 scrapinghub/splash --max-timeout 3600

但我只是收到错误：

Unable to find image '8050:8050' locally

我相信这也许能解决我的问题，也许不能，我需要进一步安装。请告诉我！我真的需要专家指导才能解决这个问题。

当docker在第一个实例上运行时我打开了另一个实例。

我在运行scrapy爬虫的时候出现以下错误：

2022-02-16 02:55:26 [scrapy_splash.middleware] WARNING: Bad request to Splash: {'error': 400, 'type': 'ScriptError', 'description': 'Error happened while executing Lua script', 'info': 
{'type': 'JS_ERROR', 'js_error_type': 'TypeError', 'js_error_message': 'null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'js_error':
 'TypeError: null is not an object (evaluating \'document.querySelector("button:nth-child(2)").getClientRects\')', 'message': '[string "..."]:12: error during JS function call: \'TypeEr
ror: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\'', 'source': '[string "..."]', 'line_number': 12, 'error': 'error during JS
 function call: \'TypeError: null is not an object (evaluating \\\'document.querySelector("button:nth-child(2)").getClientRects\\\')\''}}
2022-02-16 02:55:26 [scrapy.core.engine] DEBUG: Crawled (400) <GET http://instagram.com/ via http://localhost:8050/execute> (referer: None)
2022-02-16 02:55:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 http://instagram.com/>: HTTP status code is not handled or not allowed

该抓取工具在我的 Mac 上运行良好，所以我肯定在某个地方遗漏了安装。

答案1

您必须使用以下docker命令：

docker run -d -p 8050:8050 scrapinghub/splash --max-timeout 3600

您忘记了-p端口转发的参数，导致 docker 认为这8050:8050是您尝试运行的图像的名称。

我从来没有听说过 scrapy，但这是一个非常基本的错误，不需要专家指导;)

至于“我如何运行 x？”。一般来说，这个问题是很多对于 SuperUser 这样的 QA 网站来说太模糊了。我建议您尝试上述方法，然后针对您遇到的任何后续错误单独提出问题。

答案1

相关内容