看起来以下日志上的引用者是一个文件夹。
112.200.208.5 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 294677 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
61.3.158.113 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/lnosKHEN/payroll_system_-_lnoskhen_0.zip HTTP/1.1" 206 10806 "http://www.mysite.com/download-code" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3"
112.200.208.5 - - [29/Jul/2013:20:43:15 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 21465 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
112.200.208.5 - - [29/Jul/2013:20:43:16 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 469304 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
112.200.208.5 - - [29/Jul/2013:20:43:17 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 238639 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
112.200.208.5 - - [29/Jul/2013:20:43:18 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 267724 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
39.41.211.234 - - [29/Jul/2013:20:43:22 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 23361 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:23 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 200 632601 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:24 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 285171 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:24 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 138366 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 104108 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 52055 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:25 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 63038 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
39.41.211.234 - - [29/Jul/2013:20:43:27 +0800] "GET /sites/default/files/download/john.lemar/zest-project.zip HTTP/1.1" 206 32452 "http://www.mysite.com/sites/default/files/download/john.lemar/" "Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0.1"
112.200.208.5 - - [29/Jul/2013:20:43:33 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 215059 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
我认为,唯一有效的下载是这一行:
61.3.158.113 - - [29/Jul/2013:20:43:14 +0800] "GET /sites/default/files/download/lnosKHEN/payroll_system_-_lnoskhen_0.zip HTTP/1.1" 206 10806 "http://www.mysite.com/download-code" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)
因为我设置了所有下载都来自这个 URL:
http://www.mysite.com/download-code
那么,引荐来源为何似乎来自文件夹?
就像这一行:
112.200.208.5 - - [29/Jul/2013:20:43:33 +0800] "GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1" 206 215059 "http://www.mysite.com/sites/default/files/download/argie/" "Mozilla/5.0 (Windows NT 6.2; rv:22.0) Gecko/20100101 Firefox/22.0"
推荐人是:
http://www.mysite.com/sites/default/files/download/argie/
这一行:
/sites/default/files/download/argie/
是一个文件夹。
即使这是一个网络爬虫,它有可能访问我的网站上的文件夹吗?
当我手动输入以下内容时:
http://www.mysite.com/sites/default/files/download/argie/
它只会返回“页面未找到”。这就是为什么我想知道它是如何成为引荐来源的。
顺便说一句,我正在使用 nginx。
答案1
您不应该过多关注 referer。客户端可以将 referer 设置为任意值。它只是请求中的一个 header。
例如
GET /sites/default/files/download/argie/pos-code.zip HTTP/1.1
Host: www.mysite.com
Referer: http://example.org/JUST/SOME/REFERRER
所以我猜爬虫只是切断了路径的末尾并将其设置为引荐来源。我不担心。