如何让 wget 下载 robots.txt 后面的 cgi 文件？

Question 1

wget --user-agent=Mozilla \
  "http://aok.heavengames.com/cgi-bin/aokcgi/display.cgi?action=t&fn=22"

Answer

wget --user-agent=Mozilla \
  "http://aok.heavengames.com/cgi-bin/aokcgi/display.cgi?action=t&fn=22"

Question 2

来自wget手册在 gnu.org 上

如果您知道自己在做什么并且确实希望关闭机器人排除，请在 .wgetrc 中将 robots 变量设置为“off”。您可以使用 -e 开关从命令行实现相同的效果，例如“wget -e robots=off url...”。

Answer

来自wget手册在 gnu.org 上

如果您知道自己在做什么并且确实希望关闭机器人排除，请在 .wgetrc 中将 robots 变量设置为“off”。您可以使用 -e 开关从命令行实现相同的效果，例如“wget -e robots=off url...”。

Question 3

经过多次尝试使用： --user-agent | robots=off 没有输出并通过检查输出的十六进制转储。

我成功了 - 通过使用下面的示例将文件输出发送为 HTML 格式

尝试这个：

wget --user-agent=Mozilla \ -e robots=off "http://aok.heavengames.com/cgi-bin/aokcgi/display.cgi?action=t&fn=22" -O -O cgi-converted-to-htmlfile.html

供参考这--user-agent=Mozilla \ -e robots=关闭 都在同一条线上

选项-e 机器人=关闭 将禁用尊重服务器 robots.txt

选项-O cgi-converted-to-htmlfile.html 将文件以 html 文件格式输出到 filename

cgi-converted-to-htmlfile.html

祝你好运，我希望这就是你想要的。

Answer