雅虎spider你在干嘛?

Other 2013-03-21 雅虎,Yahoo,spider

这几天访问量突增,流量/IO一下上去了,于是检查Nginx日志,发现有好多都是spider爬行记录。谷歌spider与百度spider居多,剩余的什么有道、微软bing、Yahoo等七八个都在爬。百度spider是笨的出名的,不用说,而雅虎spider貌似比其他spider更顽固。

我有一个go页面是做留言者网站跳转用的,所以spider爬到该页面的时候多半是302,最后spider都会绕行。唯独雅虎spider不放弃,每天坚持N次才罢休,记录:

110.75.171.110 - - [20/Mar/2013:17:07:17 +0800] "GET /20120554.html HTTP/1.1" 200 7965 "-" "Yahoo! Slurp China" -
110.75.172.109 - - [20/Mar/2013:17:10:20 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.107 - - [20/Mar/2013:17:10:26 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.108 - - [20/Mar/2013:17:10:35 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.112 - - [20/Mar/2013:17:10:48 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.111 - - [20/Mar/2013:17:10:54 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.109 - - [20/Mar/2013:17:10:56 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.109 - - [20/Mar/2013:17:10:58 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.107 - - [20/Mar/2013:17:11:00 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.108 - - [20/Mar/2013:17:11:02 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.112 - - [20/Mar/2013:17:11:07 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.111 - - [20/Mar/2013:17:11:17 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.110 - - [20/Mar/2013:17:11:26 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.109 - - [20/Mar/2013:17:11:32 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.107 - - [20/Mar/2013:17:11:36 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.108 - - [20/Mar/2013:17:11:38 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.112 - - [20/Mar/2013:17:11:39 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.111 - - [20/Mar/2013:17:11:41 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.110 - - [20/Mar/2013:17:11:44 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.109 - - [20/Mar/2013:17:11:48 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.172.107 - - [20/Mar/2013:17:11:55 +0800] "GET /go/BeT0h1 HTTP/1.1" 302 5 "-" "Yahoo! Slurp China" -
110.75.173.195 - - [20/Mar/2013:17:59:07 +0800] "GET /201302280.html HTTP/1.1" 200 6544 "-" "Yahoo! Slurp China" -

以上是昨天的记录,基本上每天都有这种情况,再这样顽固下去我真要找一个对策,禁雅虎spider IP,反正雅虎搜索也基本Over

最后附上上雅虎中国搜索小样:

文字链接:《雅虎spider你在干嘛?

文章地址:http://www.qttc.net/201303293.html

除非标注,琼台博客所有博文均为原创,转载请加文字链接注明来源

乳名?小名?昵称?网名?均可

email,放心,我不会给你乱投广告的

想获得回访就把你的站点URL写上(没有留空)

[NOTICE]木要投放广告
[NOTICE]木要骂人,说不该说的话
[NOTICE]自由言论,但要遵纪守法

Comments 3