Slowing down the spiders…

,
Spiders Hammering mura.org - December 2010

As part of my “WTF is my website so slow” explorations. I found that spiders (web crawlers) seemed to be hammering my site.

Taking a look at the logs, I saw a lot of spiders hammering my site.

[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”]

Spiders Hammering mura.org - December 2010

Source: Brandon/AwStats

 

Spiders Hammering mura.org - December 2010

One of them was obvious…Yandex. Yandex was up 150MB over November 2010–buzz, stop that! The others, were a collection of a whole bunch of spiders that transferred 881MB worth of data.

“Yandex – We are the leading internet company in Russia, operating the most popular search engine and most visited website. In 2010, we generated 64% of all search traffic in Russia, our homepage attracted a monthly average of 21.5 million users, and we were the largest internet company in Russia by revenue.”

Source: Yandex. (2011). About. Retrieved March 20, 2011 from Yandex website: http://about.yandex.com/

For those spiders that follow the exclusion principles, I put in place the robots.txt file listed below to try and filter the spiders based on their user-agent string. I allow through Google, Microsoft, Yahoo, the Internet Archive, Ask (Teoma), Docomo (Japan) and WordPress. You’ll notice Yandex isn’t on the list! And I’m still wondering if I should let Baidu (China) in…right now, I feel it’s just a stopgap to prevent wholesale content hijacking.

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: MSNbot
Allow: /
User-agent: msnbot-media
Allow: /
User-agent: Slurp
Allow: /
User-agent: ia_archiver
Allow: /
User-agent: archive.org_bot
Allow: /
User-agent: Teoma
Allow: /
User-agent: docomo
Allow: /
User-agent: wordpress
Allow: /
User-agent: Voyager
Disallow: /

[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]