如何使用 .htaccess 阻止所有机器人？

Question

所有机器人都应该被/robots.txt（而不是.htaccess）阻止，如下所示：

# cat robots.txt
User-agent: *
Disallow: /

该文件需要位于文档根目录中并且全局可读。通过在网络浏览器中打开它进行检查：http://yourdomain/robots.txt应该给出文件内容。

从技术上讲，机器人可能选择不遵循这一点，但实际上应该这样做。我确信 Bing 确实如此。

如果由于某种原因（不太可能使用实际的 Bing）这不起作用，请尝试

# cat .htaccess
SetEnvIfNoCase User-Agent .*bot.* search_robot
SetEnvIfNoCase User-Agent .*bing.* search_robot
SetEnvIfNoCase User-Agent .*crawl.* search_robot
Order Deny,Allow
Deny from env=search_robot
Allow from All

您需要mod_setenvif为此启用 apache 模块，请参阅http://www.askapache.com/htaccess/setenvif.html

Answer 1