如何阻止恶意机器人、蜘蛛、爬虫和收割机

如何阻止恶意机器人、蜘蛛、爬虫和收割机

我厌倦了这些糟糕的机器人、蜘蛛、爬虫和收割机。我已经使用 fail2ban 将我的服务器配置为阻止连接的 IP 5 分钟,最大重试次数为 250。但仍然有些 IP 无法被拦截,因为它们在 5 分钟内访问次数不超过 250。

这是我的 jail.local 配置:

[http-get-dos]
enabled = true
filter = http-get-dos
logpath = /var/log/ispconfig/httpd/*/access.log
maxretry = 250
findtime = 300
#ban for 10 hours
bantime = 36000
action = iptables-multiport[name=HTTP, port="http,https", protocol=tcp]
         cloudflare-blacklist
         sendmail-whois[name=HTTP, [email protected]]

这是 http-get-dos.conf 过滤文件:

[Definition]

failregex = ^<HOST> -.*"(GET|POST)

ignoreregex =

大部分可以阻止此爬虫的教程都使用 apache。但是因为我使用的是 nginx,所以我无法使用它们。下面是其中一个教程我发现。

以下是该机器人的示例日志:

220.225.127.41 - - [24/Jul/2013:00:00:19 +0800] "GET /php?page=9 HTTP/1.1" 200 10897 "http://www.mysite.com/php?page=8" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:22 +0800] "GET /sites/default/files/download/jkev/jkev_search.zip HTTP/1.1" 200 35199 "http://www.mysite.com/sites/default/files/download/jkev/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:00:00:24 +0800] "GET /sites/default/files/styles/thumbnail/public/images/kalola/sk_3.jpg?itok=-pXuOEq2 HTTP/1.1" 200 3958 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:24 +0800] "GET /sites/default/files/styles/thumbnail/public/images/kalola/sk_1.jpg?itok=ug6jsTPP HTTP/1.1" 200 3958 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:24 +0800] "GET /sites/default/files/styles/thumbnail/public/images/kalola/sk_2.jpg?itok=ZPOMnJeK HTTP/1.1" 200 3958 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:26 +0800] "GET /sites/default/files/styles/thumbnail/public/images/argie/currency.jpg?itok=hodqOr4_ HTTP/1.1" 200 7976 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:26 +0800] "GET /sites/default/files/styles/thumbnail/public/images/localhost27/untitled.jpg?itok=uVeczDjI HTTP/1.1" 200 3136 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:26 +0800] "GET /sites/default/files/styles/thumbnail/public/images/Oelasor/screenshot_11.jpg?itok=uu3d0GpX HTTP/1.1" 200 6674 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:27 +0800] "GET /sites/default/files/styles/thumbnail/public/images/localhost27/member.jpg?itok=inA9ULoC HTTP/1.1" 200 4500 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:28 +0800] "GET /php/4852/shopping-cart-checkout-using-codeigniter.html HTTP/1.1" 200 11414 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:29 +0800] "GET /sites/default/files/styles/medium/public/images/admin/codeigniter_shopping_cart.jpg?itok=QO0YV6JP HTTP/1.1" 200 22534 "http://www.mysite.com/php/4852/shopping-cart-checkout-using-codeigniter.html" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:32 +0800] "GET /php/4846/simple-ajax-example-php.html HTTP/1.1" 200 10174 "http://www.mysite.com/php?page=9" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"
220.225.127.41 - - [24/Jul/2013:00:00:34 +0800] "GET /sites/default/files/download/teejaygenius/e_library.zip HTTP/1.1" 206 3655400 "http://www.mysite.com/sites/default/files/download/teejaygenius/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:00:00:36 +0800] "GET /sites/default/files/download/Chritian/bus.zip HTTP/1.1" 206 4462491 "http://www.mysite.com/sites/default/files/download/Chritian/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:00:00:37 +0800] "GET /sites/default/files/styles/medium/public/images/kalola/sk_2.jpg?itok=1N0a__bq HTTP/1.1" 200 9693 "http://www.mysite.com/php/4846/simple-ajax-example-php.html" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36"e220.225.127.41 - - [24/Jul/2013:03:03:13 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 1555432 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"                                                                                                                             220.225.127.41 - - [24/Jul/2013:03:03:20 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18541381 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:03:29 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 6186320 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:03:31 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 13495467 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:03:34 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 17908605 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:03:51 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 10082448 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:03:57 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 8639709 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:03 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 12150765 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:04 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 17972316 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:09 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18453052 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:23 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 777716 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:40 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 8033075 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:45 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 12935983 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:49 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 8262600 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:49 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 11598966 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:49 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 11249310 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:04:57 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 5969210 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:02 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 12978641 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:03 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 13390784 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:07 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 6124786 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:15 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 9962834 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:19 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 12021359 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:27 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 8432875 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:44 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18371964 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:46 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 19867749 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:05:50 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18164900 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:00 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 17839100 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:01 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18329973 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:11 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18651902 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:31 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 9858200 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:34 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 12914955 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:36 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 13315966 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:38 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 12804285 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:41 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 6043976 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:42 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 11900897 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:06:52 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 8293782 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:07:06 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 11582412 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:07:24 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18667357 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:07:27 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 7977266 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:07:35 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 11190040 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:07:36 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18555860 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:09 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 5932064 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:10 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 12730175 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:13 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 13208853 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:16 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 8178860 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:22 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 5896753 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:25 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 8183834 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:26 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 12671818 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:30 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18581925 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:36 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18224268 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:37 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 11761743 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:51 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 11412627 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:08:59 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 18600749 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:09:01 +0800] "GET /sites/default/files/download/mindgamez/system1.zip HTTP/1.1" 206 11129155 "http://www.mysite.com/sites/default/files/download/mindgamez/" "FDM 3.x"
220.225.127.41 - - [24/Jul/2013:03:09:14 +0800] "GET /sites/default/files/download/argie/tameraplazainn.zip HTTP/1.1" 206 7836467 "http://www.mysite.com/sites/default/files/download/argie/" "FDM 3.x"

以下是按小时计算的访问频率:

# grep "220.225.127.41" /var/log/ispconfig/httpd/*/access.log | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c
545 00:00
524 01:00
404 02:00
491 03:00
396 04:00
183 05:00

以下是每分钟的访问频率(大约午夜 12 点):

# grep "220.225.127.41 - - \[24/Jul/2013:00" /var/log/ispconfig/httpd/*/access.log | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":"$3}' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}'
33 00:00
14 00:01
12 00:03
26 00:05
15 00:10
18 00:11
22 00:13
15 00:14
14 00:15
15 00:18
21 00:19
17 00:20
15 00:23
14 00:24
17 00:25
27 00:29
15 00:30
18 00:32
14 00:52

以下是按分钟计算的访问频率(大约凌晨 1 点):

# grep "220.225.127.41 - - \[24/Jul/2013:01" /var/log/ispconfig/httpd/*/access.log | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":"$3}' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0}'
16 01:01
16 01:02
12 01:05
16 01:06
14 01:10
14 01:11
14 01:12
13 01:14
22 01:16
18 01:17
13 01:21
21 01:22
14 01:26
20 01:37
30 01:38
13 01:45
11 01:50
17 01:51
11 01:53

有没有办法使用 IPTables 或其他东西来阻止这种情况?

如果我降低它,我担心一些合法流量也会被禁止。

访问率非常低。我无法将最大重试次数设置为 50 甚至 70。这也会禁止合法流量。

那么我该如何避免这种情况呢?它们消耗了太多带宽。以前我的正常带宽是每天 59.31 GB,但现在已达到 136.74 GB。

答案1

也许,作为第一步,限制连接数量可能会有所帮助(IPTABLES):

(从http://www.extrapepperoni.com/post/2013/03/iptables%3A-connlimit):

-A INPUT -j ACCEPT -p tcp --dport    80 -s xxx.yyy.0.0/16 --syn -m connlimit ! --connlimit-above 20
-A INPUT -j ACCEPT -p tcp --dport    80                   --syn -m connlimit ! --connlimit-above 5 --connlimit-mask 24

这主要有助于抵御 DDoS 攻击,但也可能造成您的问题:第一条规则允许内部用户(来自特定网络)最多连接 20 个。第二条规则允许其他所有人每次只能连接 5 个。

用于流量整形的更通用但相当复杂的命令行工具是 tc: http://www.tldp.org/HOWTO/html_single/Traffic-Control-HOWTO/

使用 tc 您可以限制特定用户、服务或客户端的带宽。

相关内容